x.ai a personal assistant
who schedules meetings for you
DATA ENGINEERING APRIL 2015 NEW YORK CITY VISIT X.AI TO JOIN THE WAITLIST
Optimizing data architecture design for
natural language processing
@alexpoon06
@xdotai
What’s x.ai?
Magically Schedule
Meetings
Pain Solution
Jane Alex Jane Amy@x.ai Alex
CC: Amy @ x.ai
“Amy, please set
something up for
John and I next
week.”
Product Characteristics
● Need quick response
● Supervised Learning requires large training data set
● # meetings scale linearly with # users
● 1 user meets with N people
● people share meeting places and company
Technical challenges
● Natural language understanding with extremely high
accuracy
● Natural conversation over email with people
● Complex data relationship
● Optimize for sparse data
● Speed of development and change
Stack
Database
(tell you in a couple of slides)
Queue based architecture
Picking a database
● Familiar technology
● Low initial maintenance
● Flexible schema
● Easy early scaling
● Reasonable production quality
Pros
● Schema-less
● Mongoose (Schema Control)
● Work out of the box
● Repliset scales reasonably well
● MMS provides good monitoring
Cons
● No joins
● Pain to do backup yourself
● DB level locking (Mongo v2.6)
● Cross datacenter is not great
● I don’t want to shard this
Modeling Meetings
{
host : Participant,
guests : [Participant],
time : { start : Date,
end: Date,
recurring: String},
timezone : String,
duration : Number,
locations : [Location],
timeInitiated : Date,
timeRescheduled: [Date],
timeCompleted: Date,
status : String,
…...
}
Modeling Meetings
Meetings
People
Places Companies
1:N and N:N relationships across
various collections
Embedding vs. Referencing
{
host :
{
name : {.....},
nicknames : [String],
phones : [{Type: String}]
primaryEmail : String,
secondaryEmails : [String],
title : String,
signatures: [String],
…...
},
travelTime : String,
status : String,
timezone : String,
duration : Number,
…...
}
{
host : Participant,
travelTime : String,
status : String,
timezone : String,
duration : Number,
…...
}
Participant
{
name : {.....},
nicknames : [String],
phones : [{Type: String}]
primaryEmail : String,
secondaryEmails : [String],
title : String,
signatures: [String],
…...
},
Embedding Referencing
Considerations
● Query patterns
● Access to embedded doc
● # references to a doc
● Application level join
● 1-way or 2-way referencing
Assistant is a PERSON Assistant is an Attribute of
PERSON
Assistant is a PROFILE, a
separate and smaller entity
Modeling someone’s assistant
1st try 2nd try 3rd try
{
name : {.....},
nicknames : [String],
phones : [{Type: String}]
primaryEmail : String,
secondaryEmails :
[String],
title : String,
signatures: [String]
…...
}
{
name :
{
first : String,
last: String
},
primaryEmail : String
}
{
name : {.....},
nicknames : [String],
phones : [{Type: String}]
primaryEmail : String,
secondaryEmails :
[String],
title : String,
signatures: [String],
assistant :
{
name : {.....},
primaryEmail : String
}
…...
}
Dealing with schema changes
Issues
● Inconsistent character offsets
● Inconsistent time representation
● Improper sent date (yr 2026)
● Key info not saved
Fixes
● Recalculate character offsets
● Reconstruct time entities
● Recalculate timezone based on
context
● Filter out unsalvageable data
Feeding data science
ML training architecture
alex @ x.ai
coo and founder
25 Broadway. 9th Floor
New York, 10005 NY
E: hello@x.ai
T: @xdotai
Visit x.ai to join the waitlist

Optimizing Data Architecture for Natural Language Processing

  • 1.
    x.ai a personalassistant who schedules meetings for you DATA ENGINEERING APRIL 2015 NEW YORK CITY VISIT X.AI TO JOIN THE WAITLIST Optimizing data architecture design for natural language processing @alexpoon06 @xdotai
  • 2.
  • 3.
    Pain Solution Jane AlexJane Amy@x.ai Alex CC: Amy @ x.ai “Amy, please set something up for John and I next week.”
  • 4.
    Product Characteristics ● Needquick response ● Supervised Learning requires large training data set ● # meetings scale linearly with # users ● 1 user meets with N people ● people share meeting places and company
  • 5.
    Technical challenges ● Naturallanguage understanding with extremely high accuracy ● Natural conversation over email with people ● Complex data relationship ● Optimize for sparse data ● Speed of development and change
  • 6.
    Stack Database (tell you ina couple of slides)
  • 7.
  • 8.
    Picking a database ●Familiar technology ● Low initial maintenance ● Flexible schema ● Easy early scaling ● Reasonable production quality
  • 9.
    Pros ● Schema-less ● Mongoose(Schema Control) ● Work out of the box ● Repliset scales reasonably well ● MMS provides good monitoring Cons ● No joins ● Pain to do backup yourself ● DB level locking (Mongo v2.6) ● Cross datacenter is not great ● I don’t want to shard this
  • 10.
    Modeling Meetings { host :Participant, guests : [Participant], time : { start : Date, end: Date, recurring: String}, timezone : String, duration : Number, locations : [Location], timeInitiated : Date, timeRescheduled: [Date], timeCompleted: Date, status : String, …... }
  • 11.
    Modeling Meetings Meetings People Places Companies 1:Nand N:N relationships across various collections
  • 12.
    Embedding vs. Referencing { host: { name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], …... }, travelTime : String, status : String, timezone : String, duration : Number, …... } { host : Participant, travelTime : String, status : String, timezone : String, duration : Number, …... } Participant { name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], …... }, Embedding Referencing Considerations ● Query patterns ● Access to embedded doc ● # references to a doc ● Application level join ● 1-way or 2-way referencing
  • 13.
    Assistant is aPERSON Assistant is an Attribute of PERSON Assistant is a PROFILE, a separate and smaller entity Modeling someone’s assistant 1st try 2nd try 3rd try { name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String] …... } { name : { first : String, last: String }, primaryEmail : String } { name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], assistant : { name : {.....}, primaryEmail : String } …... }
  • 14.
    Dealing with schemachanges Issues ● Inconsistent character offsets ● Inconsistent time representation ● Improper sent date (yr 2026) ● Key info not saved Fixes ● Recalculate character offsets ● Reconstruct time entities ● Recalculate timezone based on context ● Filter out unsalvageable data
  • 15.
  • 16.
  • 17.
    alex @ x.ai cooand founder 25 Broadway. 9th Floor New York, 10005 NY E: hello@x.ai T: @xdotai Visit x.ai to join the waitlist