SlideShare a Scribd company logo
1 of 21
C# Natural LanguageEngine Ian Mercer, ian@abodit.com http://blog.abodit.com
Introduction This presentation outlines the C# Natural Language Engine used by the ‘Smartest House in the World’, a home automation system developed by the author If you are interested in using the C# Natural Language Engine presented here in a commercial product please contact the author.
C# Natural LanguageEngine Existing Natural LanguageEngines Have a large, STATIC dictionary data file Can parsecomplex sentence structure Hand back a tree of tokens (strings) Don’thandle conversations C# NLP Engine Definesstrongly-typedtokens in code Uses type inheritance to model ‘is a’ Defines sentences in code Rulesengineexecutes sentences Understandscontext (conversation history)
Sample conversation … Complex temporal expressions … Ask it to play music … become database queries Handles async conversations Understands names …
Goals Makeiteasy to definetokens and sentences (not XML) Safe, compile-time checkeddefinition of the syntax and grammar (not XML) Model real-world inheritancewith C# class inheritance:	‘a labrador’ is ‘a dog’ is ‘an animal’ is ‘a thing’ Handleambiguity, e.g.playsomethingin the air tonightin the kitchenremind me at 4pm to call johnat 5pm
C# NLP Engine Structure Token Definitions Sentence Definitions 1 1 Sentence ‘Executed’ Input Token Parser 2 6 State 3 4 5 Rules Engine
Tokens - TokenDefinition A hierarchy of Token-derived classes Uses inheritance, e.g. TokenOnis a TokenOnOffis a TokenStateis a Token This allows a single sentence rule to handle multiple cases, e.g. On and Off Derivedfrom base Token class Simple tokens are a set of words e.g. « is | are  » Complextokens have a parser e.g. TokenDouble
A Simple TokenDefinition 	publicclassTokenPersonalPronoun: TokenGenericNoun{     internalstaticstringwordz        { get { return"he,him,she,her,them"; } } }  Recognizesany of the wordsspecified Can use inheritance (as in thisexamplewhere a PersonalPronounismodelled as a subclass of GenericNoun)
A ComplexToken publicabstractclassTokenNumber: Token  { publicstaticIEnumerable<TokenResult> Initialize(string input) {  	…      Initializemethodparses input and returns one or more possible parses. TokenNumberis a good example: Parsesanynumeric value and returns one or more of TokenInt, TokenLong, TokenIntOrdinal, TokenDouble, or TokenPercentageresults.
The catch-all TokenPhrase publicclassTokenPhrase : Token  TokenPhrase matches anything, especiallyanything in quote marks add a remindercall Brunoat 4pm Sentence signature couldbe (…, TokenAdd, TokenReminder, TokenPhrase, TokenExactTime) This would match the ruletoo … add a reminderdiscuss 6pm conference call with Bruno at 4pm
TemporalTokens A complete set of tokens and related classes for representing time Point in time, e.g. todayat 5pm Approximate time, e.g. whocalledat 5pm today Finitesequence, e.g. every Thursday in May 2009 Infinitesequence, e.g. every Thursday Ambiguous time withcontext, e.g. remind me on Tuesday (contextmeansitisnext Tuesday) Null time Unknowable/incomprehensible time
TemporalTokens (Cont.) Code to merge any sequence of temporal tokens to the smallest canonical representation, e.g.  	the first thursday in may 2009 {TIMETHEFIRST the first} + {THURSDAY thursday} + {MAY in may} + {INT 2009 -> 2009} [TEMPORALSETFINITESINGLEINTERVAL [Thursday 5/7/2009] ]
TemporalTokens (Cont.) Finite TemporalClasses provide A way to enumerate the DateTimeRanges they cover All TemporalClasses provide A LINQ expression generator and Entity-SQL expression generator allowing them to be used to query a database
Existing Token Types Numbers (double, long, int, percentage, phone, temperature) File names, Directories URLs, Domain names Names, Companies, Addresses Rooms, Lights, Sensors, Sprinklers, … States (On, Off, Dim, Bright, Loud, Quiet, …) Units of Time, Weight, Distance Songs, albums, artists, genres, tags Temporal expressions Commands, verbs, nouns, pronouns, …
Rules - A simple rule ///<summary> /// Set a light to a given state  ///</summary>  privatestaticvoidLightState(NLPStatest, TokenLighttlight, TokenStateOnOffts) { if (ts.IsTrueState == true) tlight.ForceOn(st.Actor); if (ts.IsTrueState == false) tlight.ForceOff(st.Actor); st.Say("I turned it " + ts.LowerCased);  }  Any method matching this signature is a sentence rule:-  NLPState, Token* Rule matching respects inheritance, and variable repeats …  … (NLPStatest, TokenThingtt, TokenStatetokenState, TokenTimeConstraint[] constraints)
State - NLPState Every sentence method takes an NLPState first parameter State includes RememberedObject(s) allowing sentences to react to anything that happened earlier in a conversation Non-interactive uses can pass a dummy state State can be per-user or per-conversation for non-realtime conversations like email
User Interface Works with a variety of user interfaces Chat (e.g Jabber/Gtalk) Web chat Email Calendar (do X at time Y) Rich client application
Token and Rule Discovery No configuration needed: all Tokens and Rules are discovered using reflection Builds a recursive descent parser tree on startup to efficiently parse any token stream Dependency injection like code to call rules methods based on matching token sequences Parser can handle array parameters as well as single parameters for more flexibility
Summary Strongly-typednaturallanguageengine Compile time checking, inheritance, … Definetokens and sentences (rules) in C# Strongly-typedtokens: numbers, percentages, times, dates, file names, urls, people, business objects, … Builds an efficient parse graph Tracks conversation history
Future plans Expanded corpus of knowledge Companynames, locations, documents, … Performance improvements Onlytryparsingtokensvalid for currentparsetree state .NET 4 Optional Arguments Account for these in reflection code duringparsetreecreation GenerateiCal/GdataRecurrence FromTimeExpressions
For more information Visit http://blog.abodit.com Contact ian@abodit.com

More Related Content

What's hot

Write Your Own JVM Compiler
Write Your Own JVM CompilerWrite Your Own JVM Compiler
Write Your Own JVM Compiler
Erin Dees
 

What's hot (20)

Learn Python The Hard Way Presentation
Learn Python The Hard Way PresentationLearn Python The Hard Way Presentation
Learn Python The Hard Way Presentation
 
Python Presentation
Python PresentationPython Presentation
Python Presentation
 
Rust Intro
Rust IntroRust Intro
Rust Intro
 
FUNDAMENTALS OF PYTHON LANGUAGE
 FUNDAMENTALS OF PYTHON LANGUAGE  FUNDAMENTALS OF PYTHON LANGUAGE
FUNDAMENTALS OF PYTHON LANGUAGE
 
한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
 
Python cheat-sheet
Python cheat-sheetPython cheat-sheet
Python cheat-sheet
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
 
Pythonppt28 11-18
Pythonppt28 11-18Pythonppt28 11-18
Pythonppt28 11-18
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
 
Python
PythonPython
Python
 
Python Programming Homework Help
Python Programming Homework HelpPython Programming Homework Help
Python Programming Homework Help
 
Python Tutorial
Python TutorialPython Tutorial
Python Tutorial
 
Playfulness at Work
Playfulness at WorkPlayfulness at Work
Playfulness at Work
 
1. python programming
1. python programming1. python programming
1. python programming
 
Write Your Own JVM Compiler
Write Your Own JVM CompilerWrite Your Own JVM Compiler
Write Your Own JVM Compiler
 
Python programming msc(cs)
Python programming msc(cs)Python programming msc(cs)
Python programming msc(cs)
 
Lesson 03 python statement, indentation and comments
Lesson 03   python statement, indentation and commentsLesson 03   python statement, indentation and comments
Lesson 03 python statement, indentation and comments
 
Os Goodger
Os GoodgerOs Goodger
Os Goodger
 
Thnad's Revenge
Thnad's RevengeThnad's Revenge
Thnad's Revenge
 

Similar to C# Natural Language Engine

Chapter2pp
Chapter2ppChapter2pp
Chapter2pp
J. C.
 
Csharp4 strings and_regular_expressions
Csharp4 strings and_regular_expressionsCsharp4 strings and_regular_expressions
Csharp4 strings and_regular_expressions
Abed Bukhari
 
AST Transformations at JFokus
AST Transformations at JFokusAST Transformations at JFokus
AST Transformations at JFokus
HamletDRC
 
AST Transformations
AST TransformationsAST Transformations
AST Transformations
HamletDRC
 
Software Transactioneel Geheugen
Software Transactioneel GeheugenSoftware Transactioneel Geheugen
Software Transactioneel Geheugen
Devnology
 
Eff Plsql
Eff PlsqlEff Plsql
Eff Plsql
afa reg
 

Similar to C# Natural Language Engine (20)

About Tokens and Lexemes
About Tokens and LexemesAbout Tokens and Lexemes
About Tokens and Lexemes
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0
 
Antlr
AntlrAntlr
Antlr
 
NLP with TensorFlow.pdf
NLP with TensorFlow.pdfNLP with TensorFlow.pdf
NLP with TensorFlow.pdf
 
What Shazam doesn't want you to know
What Shazam doesn't want you to knowWhat Shazam doesn't want you to know
What Shazam doesn't want you to know
 
Generating parsers using Ragel and Lemon
Generating parsers using Ragel and LemonGenerating parsers using Ragel and Lemon
Generating parsers using Ragel and Lemon
 
Chapter2pp
Chapter2ppChapter2pp
Chapter2pp
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Csharp4 strings and_regular_expressions
Csharp4 strings and_regular_expressionsCsharp4 strings and_regular_expressions
Csharp4 strings and_regular_expressions
 
Basic of Python- Hands on Session
Basic of Python- Hands on SessionBasic of Python- Hands on Session
Basic of Python- Hands on Session
 
AST Transformations at JFokus
AST Transformations at JFokusAST Transformations at JFokus
AST Transformations at JFokus
 
AST Transformations
AST TransformationsAST Transformations
AST Transformations
 
Php and MySQL
Php and MySQLPhp and MySQL
Php and MySQL
 
Beginning text analysis
Beginning text analysisBeginning text analysis
Beginning text analysis
 
Clojure concurrency
Clojure concurrencyClojure concurrency
Clojure concurrency
 
Python master class 2
Python master class 2Python master class 2
Python master class 2
 
Software Transactioneel Geheugen
Software Transactioneel GeheugenSoftware Transactioneel Geheugen
Software Transactioneel Geheugen
 
Summary of C++17 features
Summary of C++17 featuresSummary of C++17 features
Summary of C++17 features
 
Introduction to Boost regex
Introduction to Boost regexIntroduction to Boost regex
Introduction to Boost regex
 
Eff Plsql
Eff PlsqlEff Plsql
Eff Plsql
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

C# Natural Language Engine

  • 1. C# Natural LanguageEngine Ian Mercer, ian@abodit.com http://blog.abodit.com
  • 2. Introduction This presentation outlines the C# Natural Language Engine used by the ‘Smartest House in the World’, a home automation system developed by the author If you are interested in using the C# Natural Language Engine presented here in a commercial product please contact the author.
  • 3. C# Natural LanguageEngine Existing Natural LanguageEngines Have a large, STATIC dictionary data file Can parsecomplex sentence structure Hand back a tree of tokens (strings) Don’thandle conversations C# NLP Engine Definesstrongly-typedtokens in code Uses type inheritance to model ‘is a’ Defines sentences in code Rulesengineexecutes sentences Understandscontext (conversation history)
  • 4. Sample conversation … Complex temporal expressions … Ask it to play music … become database queries Handles async conversations Understands names …
  • 5. Goals Makeiteasy to definetokens and sentences (not XML) Safe, compile-time checkeddefinition of the syntax and grammar (not XML) Model real-world inheritancewith C# class inheritance: ‘a labrador’ is ‘a dog’ is ‘an animal’ is ‘a thing’ Handleambiguity, e.g.playsomethingin the air tonightin the kitchenremind me at 4pm to call johnat 5pm
  • 6. C# NLP Engine Structure Token Definitions Sentence Definitions 1 1 Sentence ‘Executed’ Input Token Parser 2 6 State 3 4 5 Rules Engine
  • 7. Tokens - TokenDefinition A hierarchy of Token-derived classes Uses inheritance, e.g. TokenOnis a TokenOnOffis a TokenStateis a Token This allows a single sentence rule to handle multiple cases, e.g. On and Off Derivedfrom base Token class Simple tokens are a set of words e.g. « is | are  » Complextokens have a parser e.g. TokenDouble
  • 8. A Simple TokenDefinition publicclassTokenPersonalPronoun: TokenGenericNoun{     internalstaticstringwordz        { get { return"he,him,she,her,them"; } } } Recognizesany of the wordsspecified Can use inheritance (as in thisexamplewhere a PersonalPronounismodelled as a subclass of GenericNoun)
  • 9. A ComplexToken publicabstractclassTokenNumber: Token { publicstaticIEnumerable<TokenResult> Initialize(string input) { …     Initializemethodparses input and returns one or more possible parses. TokenNumberis a good example: Parsesanynumeric value and returns one or more of TokenInt, TokenLong, TokenIntOrdinal, TokenDouble, or TokenPercentageresults.
  • 10. The catch-all TokenPhrase publicclassTokenPhrase : Token TokenPhrase matches anything, especiallyanything in quote marks add a remindercall Brunoat 4pm Sentence signature couldbe (…, TokenAdd, TokenReminder, TokenPhrase, TokenExactTime) This would match the ruletoo … add a reminderdiscuss 6pm conference call with Bruno at 4pm
  • 11. TemporalTokens A complete set of tokens and related classes for representing time Point in time, e.g. todayat 5pm Approximate time, e.g. whocalledat 5pm today Finitesequence, e.g. every Thursday in May 2009 Infinitesequence, e.g. every Thursday Ambiguous time withcontext, e.g. remind me on Tuesday (contextmeansitisnext Tuesday) Null time Unknowable/incomprehensible time
  • 12. TemporalTokens (Cont.) Code to merge any sequence of temporal tokens to the smallest canonical representation, e.g. the first thursday in may 2009 {TIMETHEFIRST the first} + {THURSDAY thursday} + {MAY in may} + {INT 2009 -> 2009} [TEMPORALSETFINITESINGLEINTERVAL [Thursday 5/7/2009] ]
  • 13. TemporalTokens (Cont.) Finite TemporalClasses provide A way to enumerate the DateTimeRanges they cover All TemporalClasses provide A LINQ expression generator and Entity-SQL expression generator allowing them to be used to query a database
  • 14. Existing Token Types Numbers (double, long, int, percentage, phone, temperature) File names, Directories URLs, Domain names Names, Companies, Addresses Rooms, Lights, Sensors, Sprinklers, … States (On, Off, Dim, Bright, Loud, Quiet, …) Units of Time, Weight, Distance Songs, albums, artists, genres, tags Temporal expressions Commands, verbs, nouns, pronouns, …
  • 15. Rules - A simple rule ///<summary> /// Set a light to a given state ///</summary> privatestaticvoidLightState(NLPStatest, TokenLighttlight, TokenStateOnOffts) { if (ts.IsTrueState == true) tlight.ForceOn(st.Actor); if (ts.IsTrueState == false) tlight.ForceOff(st.Actor); st.Say("I turned it " + ts.LowerCased); } Any method matching this signature is a sentence rule:- NLPState, Token* Rule matching respects inheritance, and variable repeats … … (NLPStatest, TokenThingtt, TokenStatetokenState, TokenTimeConstraint[] constraints)
  • 16. State - NLPState Every sentence method takes an NLPState first parameter State includes RememberedObject(s) allowing sentences to react to anything that happened earlier in a conversation Non-interactive uses can pass a dummy state State can be per-user or per-conversation for non-realtime conversations like email
  • 17. User Interface Works with a variety of user interfaces Chat (e.g Jabber/Gtalk) Web chat Email Calendar (do X at time Y) Rich client application
  • 18. Token and Rule Discovery No configuration needed: all Tokens and Rules are discovered using reflection Builds a recursive descent parser tree on startup to efficiently parse any token stream Dependency injection like code to call rules methods based on matching token sequences Parser can handle array parameters as well as single parameters for more flexibility
  • 19. Summary Strongly-typednaturallanguageengine Compile time checking, inheritance, … Definetokens and sentences (rules) in C# Strongly-typedtokens: numbers, percentages, times, dates, file names, urls, people, business objects, … Builds an efficient parse graph Tracks conversation history
  • 20. Future plans Expanded corpus of knowledge Companynames, locations, documents, … Performance improvements Onlytryparsingtokensvalid for currentparsetree state .NET 4 Optional Arguments Account for these in reflection code duringparsetreecreation GenerateiCal/GdataRecurrence FromTimeExpressions
  • 21. For more information Visit http://blog.abodit.com Contact ian@abodit.com