Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
C# Natural LanguageEngine<br />Ian Mercer, ian@abodit.com<br />http://blog.abodit.com<br />
Introduction<br />This presentation outlines the C# Natural Language Engine used by the ‘Smartest House in the World’, a h...
C# Natural LanguageEngine<br />Existing Natural LanguageEngines<br />Have a large, STATIC dictionary data file<br />Can pa...
Sample conversation …<br />Complex temporal expressions …<br />Ask it to play music<br />… become database queries<br />Ha...
Goals<br />Makeiteasy to definetokens and sentences (not XML)<br />Safe, compile-time checkeddefinition of the syntax and ...
C# NLP Engine Structure<br />Token<br />Definitions<br />Sentence<br />Definitions<br />1<br />1<br />Sentence<br />‘Execu...
Tokens - TokenDefinition<br />A hierarchy of Token-derived classes<br />Uses inheritance, e.g. TokenOnis a TokenOnOffis a ...
A Simple TokenDefinition<br />	publicclassTokenPersonalPronoun: TokenGenericNoun{     internalstaticstringwordz        { g...
A ComplexToken<br />publicabstractclassTokenNumber: Token <br />{ publicstaticIEnumerable<TokenResult> Initialize(string i...
The catch-all TokenPhrase<br />publicclassTokenPhrase : Token <br />TokenPhrase matches anything, especiallyanything in qu...
TemporalTokens<br />A complete set of tokens and related classes for representing time<br />Point in time, e.g. todayat 5p...
TemporalTokens (Cont.)<br />Code to merge any sequence of temporal tokens to the smallest canonical representation, e.g. <...
TemporalTokens (Cont.)<br />Finite TemporalClasses provide<br />A way to enumerate the DateTimeRanges they cover<br />All ...
Existing Token Types<br />Numbers (double, long, int, percentage, phone, temperature)<br />File names, Directories<br />UR...
Rules - A simple rule<br />///<summary><br />/// Set a light to a given state <br />///</summary> <br />privatestaticvoidL...
State - NLPState<br />Every sentence method takes an NLPState first parameter<br />State includes RememberedObject(s) allo...
User Interface<br />Works with a variety of user interfaces<br />Chat (e.g Jabber/Gtalk)<br />Web chat<br />Email<br />Cal...
Token and Rule Discovery<br />No configuration needed: all Tokens and Rules are discovered using reflection<br />Builds a ...
Summary<br />Strongly-typednaturallanguageengine<br />Compile time checking, inheritance, …<br />Definetokens and sentence...
Future plans<br />Expanded corpus of knowledge<br />Companynames, locations, documents, …<br />Performance improvements<br...
For more information<br />Visit http://blog.abodit.com<br />Contact ian@abodit.com<br />
Upcoming SlideShare
Loading in …5
×

C# Natural Language Engine

6,513 views

Published on

Describes a natural language engine built in C# that provides a strongly-typed parser and rules engine. For more information please visit http://blog.abodit.com

Published in: Technology
  • Be the first to comment

C# Natural Language Engine

  1. 1. C# Natural LanguageEngine<br />Ian Mercer, ian@abodit.com<br />http://blog.abodit.com<br />
  2. 2. Introduction<br />This presentation outlines the C# Natural Language Engine used by the ‘Smartest House in the World’, a home automation system developed by the author<br />If you are interested in using the C# Natural Language Engine presented here in a commercial product please contact the author.<br />
  3. 3. C# Natural LanguageEngine<br />Existing Natural LanguageEngines<br />Have a large, STATIC dictionary data file<br />Can parsecomplex sentence structure<br />Hand back a tree of tokens (strings)<br />Don’thandle conversations<br />C# NLP Engine<br />Definesstrongly-typedtokens in code<br />Uses type inheritance to model ‘is a’<br />Defines sentences in code<br />Rulesengineexecutes sentences<br />Understandscontext (conversation history)<br />
  4. 4. Sample conversation …<br />Complex temporal expressions …<br />Ask it to play music<br />… become database queries<br />Handles async conversations<br />Understands names …<br />
  5. 5. Goals<br />Makeiteasy to definetokens and sentences (not XML)<br />Safe, compile-time checkeddefinition of the syntax and grammar (not XML)<br />Model real-world inheritancewith C# class inheritance: ‘a labrador’ is ‘a dog’ is ‘an animal’ is ‘a thing’<br />Handleambiguity, e.g.playsomethingin the air tonightin the kitchenremind me at 4pm to call johnat 5pm<br />
  6. 6. C# NLP Engine Structure<br />Token<br />Definitions<br />Sentence<br />Definitions<br />1<br />1<br />Sentence<br />‘Executed’<br />Input<br />Token Parser<br />2<br />6<br />State<br />3<br />4<br />5<br />Rules<br />Engine<br />
  7. 7. Tokens - TokenDefinition<br />A hierarchy of Token-derived classes<br />Uses inheritance, e.g. TokenOnis a TokenOnOffis a TokenStateis a Token<br />This allows a single sentence rule to handle multiple cases, e.g. On and Off<br />Derivedfrom base Token class<br />Simple tokens are a set of words<br />e.g. « is | are  »<br />Complextokens have a parser<br />e.g. TokenDouble<br />
  8. 8. A Simple TokenDefinition<br /> publicclassTokenPersonalPronoun: TokenGenericNoun{     internalstaticstringwordz        { get { return"he,him,she,her,them"; } } } <br />Recognizesany of the wordsspecified<br />Can use inheritance (as in thisexamplewhere a PersonalPronounismodelled as a subclass of GenericNoun)<br />
  9. 9. A ComplexToken<br />publicabstractclassTokenNumber: Token <br />{ publicstaticIEnumerable<TokenResult> Initialize(string input) { <br /> …     <br />Initializemethodparses input and returns one or more possible parses.<br />TokenNumberis a good example:<br />Parsesanynumeric value and returns one or more of TokenInt, TokenLong, TokenIntOrdinal, TokenDouble, or TokenPercentageresults.<br />
  10. 10. The catch-all TokenPhrase<br />publicclassTokenPhrase : Token <br />TokenPhrase matches anything, especiallyanything in quote marks<br />add a remindercall Brunoat 4pm<br />Sentence signature couldbe<br />(…, TokenAdd, TokenReminder, TokenPhrase, TokenExactTime)<br />This would match the ruletoo …<br />add a reminderdiscuss 6pm conference call with Bruno at 4pm<br />
  11. 11. TemporalTokens<br />A complete set of tokens and related classes for representing time<br />Point in time, e.g. todayat 5pm<br />Approximate time, e.g. whocalledat 5pm today<br />Finitesequence, e.g. every Thursday in May 2009<br />Infinitesequence, e.g. every Thursday<br />Ambiguous time withcontext, e.g. remind me on Tuesday (contextmeansitisnext Tuesday)<br />Null time<br />Unknowable/incomprehensible time<br />
  12. 12. TemporalTokens (Cont.)<br />Code to merge any sequence of temporal tokens to the smallest canonical representation, e.g. <br /> the first thursday in may 2009<br />{TIMETHEFIRST the first} + {THURSDAY thursday} + {MAY in may} + {INT 2009 -> 2009}<br />[TEMPORALSETFINITESINGLEINTERVAL [Thursday 5/7/2009] ]<br />
  13. 13. TemporalTokens (Cont.)<br />Finite TemporalClasses provide<br />A way to enumerate the DateTimeRanges they cover<br />All TemporalClasses provide<br />A LINQ expression generator and Entity-SQL expression generator allowing them to be used to query a database<br />
  14. 14. Existing Token Types<br />Numbers (double, long, int, percentage, phone, temperature)<br />File names, Directories<br />URLs, Domain names<br />Names, Companies, Addresses<br />Rooms, Lights, Sensors, Sprinklers, …<br />States (On, Off, Dim, Bright, Loud, Quiet, …)<br />Units of Time, Weight, Distance<br />Songs, albums, artists, genres, tags<br />Temporal expressions<br />Commands, verbs, nouns, pronouns, …<br />
  15. 15. Rules - A simple rule<br />///<summary><br />/// Set a light to a given state <br />///</summary> <br />privatestaticvoidLightState(NLPStatest, TokenLighttlight, TokenStateOnOffts)<br />{ if (ts.IsTrueState == true) tlight.ForceOn(st.Actor); if (ts.IsTrueState == false) tlight.ForceOff(st.Actor); st.Say("I turned it " + ts.LowerCased); <br />} <br />Any method matching this signature is a sentence rule:- NLPState, Token*<br />Rule matching respects inheritance, and variable repeats …<br /> … (NLPStatest, TokenThingtt, TokenStatetokenState, TokenTimeConstraint[] constraints) <br />
  16. 16. State - NLPState<br />Every sentence method takes an NLPState first parameter<br />State includes RememberedObject(s) allowing sentences to react to anything that happened earlier in a conversation<br />Non-interactive uses can pass a dummy state<br />State can be per-user or per-conversation for non-realtime conversations like email<br />
  17. 17. User Interface<br />Works with a variety of user interfaces<br />Chat (e.g Jabber/Gtalk)<br />Web chat<br />Email<br />Calendar (do X at time Y)<br />Rich client application<br />
  18. 18. Token and Rule Discovery<br />No configuration needed: all Tokens and Rules are discovered using reflection<br />Builds a recursive descent parser tree on startup to efficiently parse any token stream<br />Dependency injection like code to call rules methods based on matching token sequences<br />Parser can handle array parameters as well as single parameters for more flexibility <br />
  19. 19. Summary<br />Strongly-typednaturallanguageengine<br />Compile time checking, inheritance, …<br />Definetokens and sentences (rules) in C#<br />Strongly-typedtokens: numbers, percentages, times, dates, file names, urls, people, business objects, …<br />Builds an efficient parse graph<br />Tracks conversation history<br />
  20. 20. Future plans<br />Expanded corpus of knowledge<br />Companynames, locations, documents, …<br />Performance improvements<br />Onlytryparsingtokensvalid for currentparsetree state<br />.NET 4 Optional Arguments<br />Account for these in reflection code duringparsetreecreation<br />GenerateiCal/GdataRecurrence<br />FromTimeExpressions<br />
  21. 21. For more information<br />Visit http://blog.abodit.com<br />Contact ian@abodit.com<br />

×