• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
C# Natural Language Engine

C# Natural Language Engine



Describes a natural language engine built in C# that provides a strongly-typed parser and rules engine. For more information please visit http://blog.abodit.com

Describes a natural language engine built in C# that provides a strongly-typed parser and rules engine. For more information please visit http://blog.abodit.com



Total Views
Views on SlideShare
Embed Views



3 Embeds 31

http://www.slideshare.net 29
http://alef.fiit.stuba.sk 1
http://www.docseek.net 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    C# Natural Language Engine C# Natural Language Engine Presentation Transcript

    • C# Natural LanguageEngine
      Ian Mercer, ian@abodit.com
    • Introduction
      This presentation outlines the C# Natural Language Engine used by the ‘Smartest House in the World’, a home automation system developed by the author
      If you are interested in using the C# Natural Language Engine presented here in a commercial product please contact the author.
    • C# Natural LanguageEngine
      Existing Natural LanguageEngines
      Have a large, STATIC dictionary data file
      Can parsecomplex sentence structure
      Hand back a tree of tokens (strings)
      Don’thandle conversations
      C# NLP Engine
      Definesstrongly-typedtokens in code
      Uses type inheritance to model ‘is a’
      Defines sentences in code
      Rulesengineexecutes sentences
      Understandscontext (conversation history)
    • Sample conversation …
      Complex temporal expressions …
      Ask it to play music
      … become database queries
      Handles async conversations
      Understands names …
    • Goals
      Makeiteasy to definetokens and sentences (not XML)
      Safe, compile-time checkeddefinition of the syntax and grammar (not XML)
      Model real-world inheritancewith C# class inheritance: ‘a labrador’ is ‘a dog’ is ‘an animal’ is ‘a thing’
      Handleambiguity, e.g.playsomethingin the air tonightin the kitchenremind me at 4pm to call johnat 5pm
    • C# NLP Engine Structure
      Token Parser
    • Tokens - TokenDefinition
      A hierarchy of Token-derived classes
      Uses inheritance, e.g. TokenOnis a TokenOnOffis a TokenStateis a Token
      This allows a single sentence rule to handle multiple cases, e.g. On and Off
      Derivedfrom base Token class
      Simple tokens are a set of words
      e.g. « is | are  »
      Complextokens have a parser
      e.g. TokenDouble
    • A Simple TokenDefinition
      publicclassTokenPersonalPronoun: TokenGenericNoun{     internalstaticstringwordz        { get { return"he,him,she,her,them"; } } }
      Recognizesany of the wordsspecified
      Can use inheritance (as in thisexamplewhere a PersonalPronounismodelled as a subclass of GenericNoun)
    • A ComplexToken
      publicabstractclassTokenNumber: Token
      { publicstaticIEnumerable<TokenResult> Initialize(string input) {
      Initializemethodparses input and returns one or more possible parses.
      TokenNumberis a good example:
      Parsesanynumeric value and returns one or more of TokenInt, TokenLong, TokenIntOrdinal, TokenDouble, or TokenPercentageresults.
    • The catch-all TokenPhrase
      publicclassTokenPhrase : Token
      TokenPhrase matches anything, especiallyanything in quote marks
      add a remindercall Brunoat 4pm
      Sentence signature couldbe
      (…, TokenAdd, TokenReminder, TokenPhrase, TokenExactTime)
      This would match the ruletoo …
      add a reminderdiscuss 6pm conference call with Bruno at 4pm
    • TemporalTokens
      A complete set of tokens and related classes for representing time
      Point in time, e.g. todayat 5pm
      Approximate time, e.g. whocalledat 5pm today
      Finitesequence, e.g. every Thursday in May 2009
      Infinitesequence, e.g. every Thursday
      Ambiguous time withcontext, e.g. remind me on Tuesday (contextmeansitisnext Tuesday)
      Null time
      Unknowable/incomprehensible time
    • TemporalTokens (Cont.)
      Code to merge any sequence of temporal tokens to the smallest canonical representation, e.g.
      the first thursday in may 2009
      {TIMETHEFIRST the first} + {THURSDAY thursday} + {MAY in may} + {INT 2009 -> 2009}
    • TemporalTokens (Cont.)
      Finite TemporalClasses provide
      A way to enumerate the DateTimeRanges they cover
      All TemporalClasses provide
      A LINQ expression generator and Entity-SQL expression generator allowing them to be used to query a database
    • Existing Token Types
      Numbers (double, long, int, percentage, phone, temperature)
      File names, Directories
      URLs, Domain names
      Names, Companies, Addresses
      Rooms, Lights, Sensors, Sprinklers, …
      States (On, Off, Dim, Bright, Loud, Quiet, …)
      Units of Time, Weight, Distance
      Songs, albums, artists, genres, tags
      Temporal expressions
      Commands, verbs, nouns, pronouns, …
    • Rules - A simple rule
      /// Set a light to a given state
      privatestaticvoidLightState(NLPStatest, TokenLighttlight, TokenStateOnOffts)
      { if (ts.IsTrueState == true) tlight.ForceOn(st.Actor); if (ts.IsTrueState == false) tlight.ForceOff(st.Actor); st.Say("I turned it " + ts.LowerCased);
      Any method matching this signature is a sentence rule:- NLPState, Token*
      Rule matching respects inheritance, and variable repeats …
      … (NLPStatest, TokenThingtt, TokenStatetokenState, TokenTimeConstraint[] constraints)
    • State - NLPState
      Every sentence method takes an NLPState first parameter
      State includes RememberedObject(s) allowing sentences to react to anything that happened earlier in a conversation
      Non-interactive uses can pass a dummy state
      State can be per-user or per-conversation for non-realtime conversations like email
    • User Interface
      Works with a variety of user interfaces
      Chat (e.g Jabber/Gtalk)
      Web chat
      Calendar (do X at time Y)
      Rich client application
    • Token and Rule Discovery
      No configuration needed: all Tokens and Rules are discovered using reflection
      Builds a recursive descent parser tree on startup to efficiently parse any token stream
      Dependency injection like code to call rules methods based on matching token sequences
      Parser can handle array parameters as well as single parameters for more flexibility
    • Summary
      Compile time checking, inheritance, …
      Definetokens and sentences (rules) in C#
      Strongly-typedtokens: numbers, percentages, times, dates, file names, urls, people, business objects, …
      Builds an efficient parse graph
      Tracks conversation history
    • Future plans
      Expanded corpus of knowledge
      Companynames, locations, documents, …
      Performance improvements
      Onlytryparsingtokensvalid for currentparsetree state
      .NET 4 Optional Arguments
      Account for these in reflection code duringparsetreecreation
    • For more information
      Visit http://blog.abodit.com
      Contact ian@abodit.com