Tim Estes - Information Systems in an Entity Centric World


Published on

Tim Estes, CEO of Digital Reasoning, talks about the use of Hadoop and other scalable technologies along with Digital Reasoning's analytics for automated understanding of cloud-scale text challenges.
This presentation was delivered at Hadoop World in New York in Oct 2010

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Tim Estes - Information Systems in an Entity Centric World

  1. 1. Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved Information Systems in an Entity-Centric World
  2. 2. What Are We Covering? 2Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved ‣ The New Entity-Oriented Mission in an Era of Information Overload ‣ Demonstration ‣ Entity-Oriented Analytics ‣ Processing ‣ Schema/Semantics ‣ The Takeaways
  3. 3. Mission Evolution from Forces to Entities 3 Armies Groups Individuals Multiple Cyber Presences for Individuals CurrentTime 200019901980 Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved
  4. 4. 4 Information Explosion: “A Wealth of Information creates a Poverty of Attention” ‣ Most of the time, what we don’t know poses the greatest risk. Missing critical information can have tragic consequences. • President Obama, when referencing the Christmas Day bomb attempt, recognized that the intelligence community failed to “connect the dots” ‣ Hiring more analysts to manually review is an expensive and insufficient approach • Costs not reasonable, lack of coverage intolerable for intelligence, defense and law enforcement agencies ‣ Most automated solutions do not understand human language sufficiently to be useful • variations in grammar, spelling and context obscure meaning to software ‣ Today's search solutions link you to matching keywords and don't deliver understanding of hidden relationships buried in unstructured text Global information created shown in Exabytes (Source: IDC) 80% of all data is unstructured • 1.73 billion internet users as of September 2009 • 247 billion emails on average sent every day in 2009 • 126 million blogs on the internet •400+ million Facebook users generate more than 5 billion pieces of content every week • 50 million Twitter messages were sent every day in January 2010 (~17-18 Billion tweets so far…) Tenfold growth in data in five years Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved
  5. 5. The Last 10 Years Have Been About Search The Next 10 Years Are Going To Be About Summarization 5Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved
  6. 6. Community Response 6 Time Innovation Keyword Search Improved Tools Enriched Data Understand Data 1992 2000 2006 Current ‣ Documents retrieved by matching keywords ‣ Limited as matching keywords doesn’t provide understanding ‣Faceted search and related filters make discovery easier ‣Actual “meaning” of data remains hidden ‣Extracts known entities adding metadata to allow filtering ‣Predetermined taxonomies of limited value for unknowns ‣Entity Resolution ‣Horizontal Scalability ‣Statistical Filtering ‣No preconceptions ‣Context discovered automatically Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved
  7. 7. Transform Enterprise Knowledge: From Docs to Things 7Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved
  8. 8. Demonstration 8Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved
  9. 9. What Are We Using? 9Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved ‣ Hadoop (Cloudera Distribution of Hadoop V3 – CDH3) for horizontally scalable ingestion and analytics processing ‣ CDH3 has beneficial tooling over vanilla Hadoop ‣ Its nice to have supported/assured capabilities by really smart guys ‣ Cassandra for persistence of entity data ‣ No Single Point of Failure – true peer data architecture ‣ Leveraging its very fast write performance ‣ Pushing and extending its distributed query capabilities ‣ Multi-point ingest and eventual consistency are useful for downstream multi- datacenter/multi-cloud deployment scenarios ‣ Complex and novel analytics algorithms working on Hadoop ‣ NLP and extraction technology is fresh and state of the art ‣ Associative Network algorithms are unique and patented ‣ Concept resolution from structured and unstructured data using unsupervised learning approach ‣ Can also incorporate human guidance (augmented intelligence) but current capabilities give us a great way to control the fire hose of information before an analyst has to deal with it
  10. 10. Synthesys TM in Entity Analytics Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 10
  11. 11. Synthesys TM in Entity Analytics Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 11
  12. 12. Entity-Centric Clouds - Takeaways 12Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved ‣ Analysts (and people) want an entity-centric system vs. a document centric one (but it has to work). ‣ Dealing with entities vs. documents or records makes the IT complexity jump 100X ‣ But it gives our analysts a 10X boost in productivity if done right (maybe 100X when we get around to writing Agents on the Entity-Centric Cloud) ‣ That’s where the cloud comes in ‣ We as a community have the processing and storage capacity to handle this now at the 100m-10B record level ‣ We should take a “conservative” approach to analysis where analytics is a continuous enrichment process and our current models are not dramatically privileged over potential future models ‣ Nearly all web-scale search / analytics systems use the following recipe: ‣ Distributed, columnar type data store ‣ Horizontally scalable analytics framework (like Hadoop) ‣ Future systems are all distributed and learning all the time. Our investments today can be ready for this future with the right architectural and technology considerations.
  13. 13. 1Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved Tim Estes| CEO | Digital Reasoning Systems | 730 Cool Springs Blvd, Suite 110, Franklin, TN 37067 office: 615.370.1860 | fax: 615.370.1865 | email: info@digitalreasoning.com website: http://www.digitalreasoning.com twitter: http://twitter.com/spooksandgeeks