• Save

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Dave Kellogg at MarkLogic 2010 Government Summit

on

  • 2,096 views

These are the slides from Dave Kellogg's presentation at the 2010 MarkLogic Government Summit in Tyson's Corner, VA.

These are the slides from Dave Kellogg's presentation at the 2010 MarkLogic Government Summit in Tyson's Corner, VA.

Statistics

Views

Total Views
2,096
Views on SlideShare
1,613
Embed Views
483

Actions

Likes
1
Downloads
0
Comments
0

2 Embeds 483

http://kellblog.com 482
http://feeds.feedburner.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Dave Kellogg at MarkLogic 2010 Government Summit Dave Kellogg at MarkLogic 2010 Government Summit Presentation Transcript

    • Taming The Unstructured Data Problem
      Dave Kellogg
      Chief Executive Officer
      11/17/10
    • Topics
      About MarkLogic
      What do we mean by “unstructured”
      What people do with unstructured information
      Conclusions
    • MarkLogic Government Began With a Hunch
      A belief that Government agencies would have
      Large amounts of
      Unstructured information and
      Want an open way to store it
      And a standard way to run complex queries against it
      Somewhere around 2005, we chose to make Government the second key sector for MarkLogic
      The first was media/publishing
    • As Hunches Go … It Was a Good One
      removed
    • Media Customers
      Government Customers
      Financial Services and Other Customers
      200+ Customers
    • Topics
      About MarkLogic
      What do we mean by “unstructured”
      What people do with unstructured information
      Conclusions
    • My Database Journey
      Lawrence Berkeley Lab
      Seismic metadata in Ingres
      Ingres 6.3
      Product manager for first DBMS with user-defined types
      BusinessObjects
      Ran marketing for 9 years from $30M to $1B
      MarkLogic
      Structured/unstructured divide
      First-class citizenship
    • What Do We Mean by “Unstructured?”
      “It is estimated that about 80% of enterprise information is unstructured
      … and contains text and data that is not readily accessible but holds immeasurable value.”
      -- IDC, White Paper 9/06
      “Excuse me for saying so, but there is no such thing as unstructured information.
      Even the simplest information has a sequence in which there is a beginning, a middle, and an end.”
      -- Steven Newcomb, Topic Maps, Chapter 3.
      <enter>long debate</enter>
    • The Information Continuum
      Information Continuum
      XML
      Metadata
      Geospatial
      Graph
      Free text
      Relational
      Time-varying
      Sparse
      N-schema
      Hierarchical
      Semi-structured
      “Unstructured”
      “Structured”
    • A Practical Definition of “Unstructured”
      That which does not model well relationally
      You could put in:
      Books, journals
      Web pages
      Message, cable traffic
      Doctrine, procedures
      Metadata
      Hierarchies, graphs
      Sparse data
      But should you?
      RELATIONERTIA
    • An Old Saw, Adapted
      If your only data modeling element’s a table, then every problem looks like a column
      We believe there is a better way
      Use XML as means represent unstructured information
      Use XQuery as language for building apps and analytics
      Implement a specialized DBMS, purpose-built for managing vast amounts of unstructured information (MarkLogic Server)
    • Topics
      About MarkLogic
      What do we mean by “unstructured”
      What people do with unstructured information
      Conclusions
    • Digital Publishing: Custom Textbook Publishing
      Browse
      Chapters
      Customize
      Create
      Search
    • Digital Publishing: Web 2.0 Applications
      Profiles
      Social network
      Social bookmarking
      Targeted Ads
      Topics
      Activity / Feed
    • Person-of-Interest Databases
      A seemingly simple problem made difficult by 2 things
      Multi-valued attributes
      Discard nothing: as many heights as sources
      Repeating groups drive creation of table per attribute
      Sparse data
      Thousands of possible attributes of which few are known
      Typical result
      500+ largely empty tables
      Huge joins cripple query performance
      Bonus
      Fun attributes like body markings
      Transliteration: Gadafi vs. Khadafi
    • Metadata Catalogs
      Digital card catalogs for tracking information assets
      Intelligence community information sharing
      Libraries and archives
      Digital asset repositories
      If you can’t search the content, search the metadata
      Why MarkLogic?
      Changing metadata standards
      Evolving metadata fields
      User-generated metadata (tagging, folksonomy)
      Text metadata where search-style matching desirable
    • Situational Awareness
      Integrating information in real-time from multiple sources to improve operational decision making
      Scraping websites, chat sessions, news, …
      Integrating geospatial information
      Pulling information from existing systems
      Civilian and Defense applications
      Why MarkLogic?
      Geospatial indexing
      Zero-latency indexing, real-time query performance
      Ability to handle diverse content in different structures
    • Intelligence Applications
      Open source intelligence
      Scrape and enrich publicly available Internet content
      Load into content repository
      Build applications that enable search and annotation
      Cellphone exploitation
      Collect contacts, call history, and messages
      Quickly load into database in the field
      Search social network for suspects
      Link analysis
      Analyze the graph of contacts and organizations
    • Topics
      About MarkLogic
      What do we mean by “unstructured”
      What people do with unstructured information
      Conclusions
    • The Relational “Data Base” Was Invented in 1970
      Provide flexible ad hoc queries to structured data
      Wasn’t thinking about
      Web content
      PDFs
      Word files
      SIGINT
      RSS feeds
      Tweets
      21st century challenges
    • What Else Happened in 1970?
      Super bowl IV
      Janis Joplin died
      Mariah Carey was born
      Beatles disbanded after Let It Be
      Monday Night Football debuted
      First episode of All My Children
      Boeing 747 entered service
      First F-14 tomcat test flight
      Gas cost $0.36/gallon
      Storage cost over $200/megabyte
    • Thank You! (And Please Follow Me At …)
      www.kellblog.com
      twitter.com/kellblog