C# Natural LanguageEngineIan Mercer, ian@abodit.comhttp://blog.abodit.com
IntroductionThis presentation outlines the C# Natural Language Engine used by the ‘Smartest House in the World’, a home automation system developed by the authorIf you are interested in using the C# Natural Language Engine presented here in a commercial product please contact the author.
C# Natural LanguageEngineExisting Natural LanguageEnginesHave a large, STATIC dictionary data fileCan parsecomplex sentence structureHand back a tree of tokens (strings)Don’thandle conversationsC# NLP EngineDefinesstrongly-typedtokens in codeUses type inheritance to model ‘is a’Defines sentences in codeRulesengineexecutes sentencesUnderstandscontext (conversation history)
Sample conversation …Complex temporal expressions …Ask it to play music… become database queriesHandles async conversationsUnderstands names …
GoalsMakeiteasy to definetokens and sentences (not XML)Safe, compile-time checkeddefinition of the syntax and grammar (not XML)Model real-world inheritancewith C# class inheritance:	‘a labrador’ is ‘a dog’ is ‘an animal’ is ‘a thing’Handleambiguity, e.g.playsomethingin the air tonightin the kitchenremind me at 4pm to call johnat 5pm
C# NLP Engine StructureTokenDefinitionsSentenceDefinitions11Sentence‘Executed’InputToken Parser26State345RulesEngine
Tokens - TokenDefinitionA hierarchy of Token-derived classesUses inheritance, e.g. TokenOnis a TokenOnOffis a TokenStateis a TokenThis allows a single sentence rule to handle multiple cases, e.g. On and OffDerivedfrom base Token classSimple tokens are a set of wordse.g. « is | are  »Complextokens have a parsere.g. TokenDouble
A Simple TokenDefinition	publicclassTokenPersonalPronoun: TokenGenericNoun{     internalstaticstringwordz        { get { return"he,him,she,her,them"; } } } Recognizesany of the wordsspecifiedCan use inheritance (as in thisexamplewhere a PersonalPronounismodelled as a subclass of GenericNoun)
A ComplexTokenpublicabstractclassTokenNumber: Token { publicstaticIEnumerable<TokenResult> Initialize(string input) { 	…     Initializemethodparses input and returns one or more possible parses.TokenNumberis a good example:Parsesanynumeric value and returns one or more of TokenInt, TokenLong, TokenIntOrdinal, TokenDouble, or TokenPercentageresults.
The catch-all TokenPhrasepublicclassTokenPhrase : Token TokenPhrase matches anything, especiallyanything in quote marksadd a remindercall Brunoat 4pmSentence signature couldbe(…, TokenAdd, TokenReminder, TokenPhrase, TokenExactTime)This would match the ruletoo …add a reminderdiscuss 6pm conference call with Bruno at 4pm
TemporalTokensA complete set of tokens and related classes for representing timePoint in time, e.g. todayat 5pmApproximate time, e.g. whocalledat 5pm todayFinitesequence, e.g. every Thursday in May 2009Infinitesequence, e.g. every ThursdayAmbiguous time withcontext, e.g. remind me on Tuesday (contextmeansitisnext Tuesday)Null timeUnknowable/incomprehensible time
TemporalTokens (Cont.)Code to merge any sequence of temporal tokens to the smallest canonical representation, e.g. 	the first thursday in may 2009{TIMETHEFIRST the first} + {THURSDAY thursday} + {MAY in may} + {INT 2009 -> 2009}[TEMPORALSETFINITESINGLEINTERVAL [Thursday 5/7/2009] ]
TemporalTokens (Cont.)Finite TemporalClasses provideA way to enumerate the DateTimeRanges they coverAll TemporalClasses provideA LINQ expression generator and Entity-SQL expression generator allowing them to be used to query a database
Existing Token TypesNumbers (double, long, int, percentage, phone, temperature)File names, DirectoriesURLs, Domain namesNames, Companies, AddressesRooms, Lights, Sensors, Sprinklers, …States (On, Off, Dim, Bright, Loud, Quiet, …)Units of Time, Weight, DistanceSongs, albums, artists, genres, tagsTemporal expressionsCommands, verbs, nouns, pronouns, …
Rules - A simple rule///<summary>/// Set a light to a given state ///</summary> privatestaticvoidLightState(NLPStatest, TokenLighttlight, TokenStateOnOffts){ if (ts.IsTrueState == true) tlight.ForceOn(st.Actor); if (ts.IsTrueState == false) tlight.ForceOff(st.Actor); st.Say("I turned it " + ts.LowerCased); } Any method matching this signature is a sentence rule:-  NLPState, Token*Rule matching respects inheritance, and variable repeats … … (NLPStatest, TokenThingtt, TokenStatetokenState, TokenTimeConstraint[] constraints)
State - NLPStateEvery sentence method takes an NLPState first parameterState includes RememberedObject(s) allowing sentences to react to anything that happened earlier in a conversationNon-interactive uses can pass a dummy stateState can be per-user or per-conversation for non-realtime conversations like email
User InterfaceWorks with a variety of user interfacesChat (e.g Jabber/Gtalk)Web chatEmailCalendar (do X at time Y)Rich client application
Token and Rule DiscoveryNo configuration needed: all Tokens and Rules are discovered using reflectionBuilds a recursive descent parser tree on startup to efficiently parse any token streamDependency injection like code to call rules methods based on matching token sequencesParser can handle array parameters as well as single parameters for more flexibility
SummaryStrongly-typednaturallanguageengineCompile time checking, inheritance, …Definetokens and sentences (rules) in C#Strongly-typedtokens: numbers, percentages, times, dates, file names, urls, people, business objects, …Builds an efficient parse graphTracks conversation history
Future plansExpanded corpus of knowledgeCompanynames, locations, documents, …Performance improvementsOnlytryparsingtokensvalid for currentparsetree state.NET 4 Optional ArgumentsAccount for these in reflection code duringparsetreecreationGenerateiCal/GdataRecurrenceFromTimeExpressions
For more informationVisit http://blog.abodit.comContact ian@abodit.com

C# Natural Language Engine

  • 1.
    C# Natural LanguageEngineIanMercer, ian@abodit.comhttp://blog.abodit.com
  • 2.
    IntroductionThis presentation outlinesthe C# Natural Language Engine used by the ‘Smartest House in the World’, a home automation system developed by the authorIf you are interested in using the C# Natural Language Engine presented here in a commercial product please contact the author.
  • 3.
    C# Natural LanguageEngineExistingNatural LanguageEnginesHave a large, STATIC dictionary data fileCan parsecomplex sentence structureHand back a tree of tokens (strings)Don’thandle conversationsC# NLP EngineDefinesstrongly-typedtokens in codeUses type inheritance to model ‘is a’Defines sentences in codeRulesengineexecutes sentencesUnderstandscontext (conversation history)
  • 4.
    Sample conversation …Complextemporal expressions …Ask it to play music… become database queriesHandles async conversationsUnderstands names …
  • 5.
    GoalsMakeiteasy to definetokensand sentences (not XML)Safe, compile-time checkeddefinition of the syntax and grammar (not XML)Model real-world inheritancewith C# class inheritance: ‘a labrador’ is ‘a dog’ is ‘an animal’ is ‘a thing’Handleambiguity, e.g.playsomethingin the air tonightin the kitchenremind me at 4pm to call johnat 5pm
  • 6.
    C# NLP EngineStructureTokenDefinitionsSentenceDefinitions11Sentence‘Executed’InputToken Parser26State345RulesEngine
  • 7.
    Tokens - TokenDefinitionAhierarchy of Token-derived classesUses inheritance, e.g. TokenOnis a TokenOnOffis a TokenStateis a TokenThis allows a single sentence rule to handle multiple cases, e.g. On and OffDerivedfrom base Token classSimple tokens are a set of wordse.g. « is | are  »Complextokens have a parsere.g. TokenDouble
  • 8.
    A Simple TokenDefinition publicclassTokenPersonalPronoun:TokenGenericNoun{     internalstaticstringwordz        { get { return"he,him,she,her,them"; } } } Recognizesany of the wordsspecifiedCan use inheritance (as in thisexamplewhere a PersonalPronounismodelled as a subclass of GenericNoun)
  • 9.
    A ComplexTokenpublicabstractclassTokenNumber: Token{ publicstaticIEnumerable<TokenResult> Initialize(string input) { …     Initializemethodparses input and returns one or more possible parses.TokenNumberis a good example:Parsesanynumeric value and returns one or more of TokenInt, TokenLong, TokenIntOrdinal, TokenDouble, or TokenPercentageresults.
  • 10.
    The catch-all TokenPhrasepublicclassTokenPhrase: Token TokenPhrase matches anything, especiallyanything in quote marksadd a remindercall Brunoat 4pmSentence signature couldbe(…, TokenAdd, TokenReminder, TokenPhrase, TokenExactTime)This would match the ruletoo …add a reminderdiscuss 6pm conference call with Bruno at 4pm
  • 11.
    TemporalTokensA complete setof tokens and related classes for representing timePoint in time, e.g. todayat 5pmApproximate time, e.g. whocalledat 5pm todayFinitesequence, e.g. every Thursday in May 2009Infinitesequence, e.g. every ThursdayAmbiguous time withcontext, e.g. remind me on Tuesday (contextmeansitisnext Tuesday)Null timeUnknowable/incomprehensible time
  • 12.
    TemporalTokens (Cont.)Code tomerge any sequence of temporal tokens to the smallest canonical representation, e.g. the first thursday in may 2009{TIMETHEFIRST the first} + {THURSDAY thursday} + {MAY in may} + {INT 2009 -> 2009}[TEMPORALSETFINITESINGLEINTERVAL [Thursday 5/7/2009] ]
  • 13.
    TemporalTokens (Cont.)Finite TemporalClassesprovideA way to enumerate the DateTimeRanges they coverAll TemporalClasses provideA LINQ expression generator and Entity-SQL expression generator allowing them to be used to query a database
  • 14.
    Existing Token TypesNumbers(double, long, int, percentage, phone, temperature)File names, DirectoriesURLs, Domain namesNames, Companies, AddressesRooms, Lights, Sensors, Sprinklers, …States (On, Off, Dim, Bright, Loud, Quiet, …)Units of Time, Weight, DistanceSongs, albums, artists, genres, tagsTemporal expressionsCommands, verbs, nouns, pronouns, …
  • 15.
    Rules - Asimple rule///<summary>/// Set a light to a given state ///</summary> privatestaticvoidLightState(NLPStatest, TokenLighttlight, TokenStateOnOffts){ if (ts.IsTrueState == true) tlight.ForceOn(st.Actor); if (ts.IsTrueState == false) tlight.ForceOff(st.Actor); st.Say("I turned it " + ts.LowerCased); } Any method matching this signature is a sentence rule:- NLPState, Token*Rule matching respects inheritance, and variable repeats … … (NLPStatest, TokenThingtt, TokenStatetokenState, TokenTimeConstraint[] constraints)
  • 16.
    State - NLPStateEverysentence method takes an NLPState first parameterState includes RememberedObject(s) allowing sentences to react to anything that happened earlier in a conversationNon-interactive uses can pass a dummy stateState can be per-user or per-conversation for non-realtime conversations like email
  • 17.
    User InterfaceWorks witha variety of user interfacesChat (e.g Jabber/Gtalk)Web chatEmailCalendar (do X at time Y)Rich client application
  • 18.
    Token and RuleDiscoveryNo configuration needed: all Tokens and Rules are discovered using reflectionBuilds a recursive descent parser tree on startup to efficiently parse any token streamDependency injection like code to call rules methods based on matching token sequencesParser can handle array parameters as well as single parameters for more flexibility
  • 19.
    SummaryStrongly-typednaturallanguageengineCompile time checking,inheritance, …Definetokens and sentences (rules) in C#Strongly-typedtokens: numbers, percentages, times, dates, file names, urls, people, business objects, …Builds an efficient parse graphTracks conversation history
  • 20.
    Future plansExpanded corpusof knowledgeCompanynames, locations, documents, …Performance improvementsOnlytryparsingtokensvalid for currentparsetree state.NET 4 Optional ArgumentsAccount for these in reflection code duringparsetreecreationGenerateiCal/GdataRecurrenceFromTimeExpressions
  • 21.
    For more informationVisithttp://blog.abodit.comContact ian@abodit.com