Your SlideShare is downloading. ×
NetBase API Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

NetBase API Presentation

3,502

Published on

a presentation describing the NetBase and ConsumerBase API

a presentation describing the NetBase and ConsumerBase API

Published in: Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,502
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Prospero API The Search and IR Engine of ConsumerBase Mark Bowles - @mark_e_bowles Chief Technology Officer
  • 2. Who, or what is Prospero?
    • A Shakespearean wizard, lurking invisibly in the background, manipulating everything around him, in order to accomplish a noble goal.
    • A software package that facilitates multiple simultaneous facet-like and data analysis searches in a value-added information retrieval application.
    • All of the above.
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 3. Prospero Features
    • Very large text index (currently about 20 Billion records)
    • Hierarchically-structured data
    • API provides proxy for familiar information retrieval paradigm
    • Application Index data model is exposed to API users (as relational databases do)
      • Client apps compose queries against it
      • And the infrastructure makes no assumptions about it
      • API provides discovery of the data model
    • API provides processing options for queries
      • Counts, histograms, page-through data retrieval, ad-hoc field retrieval
      • Aligned with GUI “Widget” approach
      • Add value at all layers of the stack
    • Most operations have sub-second response time
    • Multiple operations “pipelined” using asynchronous API
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 4. Each GUI Widget has its own query needs | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. AND( DocumentObj.text=‘listerine’, ChoiceIterator( DocumentObj.text=‘oral cancer’, DocumentObj.text=‘kill taste bud’, DocumentObj.text=‘bad breath’, DocumentObj.text=‘dry mouth’, DocumentObj.text=‘burns’, DocumentObj.text=‘expensive’ ), DateSeriesIterator(DocumentObj.date, ‘ 2009-01-01’, 0, 12, ‘d-MMM’) )
  • 5. Three “Flavors” of API (with the same semantics) | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. NetBase App e.g. ConsumerBase Application Index Local API TCP/IP API Remote App e.g. Customer - Developed Java Application WS API WebService App e.g. Web Portal TCP/IP HTTP
  • 6. The Prospero API | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 7. Prospero API Operation Object | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 8. Prospero API “Hello World” program (synchronous version – an async version will be more useful in a GUI app) | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 9. The “Hello World” Pattern and its Result
    • AND(
    • DocumentObj.text='photoshop',
    • DateSeriesIterator(
    • DocumentObj.datetime,
    • '2009-02-01', 7, 52, 'yyyy-MM-dd')
    • )
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. {2009-05-17=64, 2010-03-14=0, 2009-12-06=35848, 2009-04-19=59, 2009-07-26=88, 2010-01-03=18369, 2009-05-10=47, 2009-04-12=43, 2009-08-16=104, 2010-02-28=4984, 2009-02-15=29, 2009-06-07=82, 2009-11-15=15109, 2010-03-07=0, 2009-04-26=58, 2009-08-23=139, 2010-01-10=21233, 2009-05-03=50, 2009-08-30=205, 2009-03-22=45, 2009-02-22=66, 2009-11-01=25388, 2009-11-08=10832, 2009-03-29=45, 2009-12-27=19790, 2009-07-05=59, 2010-01-24=20688, 2009-05-31=68, 2009-09-27=206, 2009-10-25=4052, 2009-09-13=204, 2010-02-07=59584, 2009-12-20=24637, 2010-02-14=131712, 2009-09-20=206, 2010-01-17=15365, 2009-06-21=81, 2009-03-15=59, 2009-06-28=54, 2009-04-05=55, 2009-08-02=73, 2010-01-31=37002, 2010-03-21=0, 2009-05-24=45, 2009-12-13=27385, 2009-07-12=76, 2009-08-09=112, 2009-10-11=329, 2009-09-06=148, 2009-07-19=98, 2009-10-04=279, 2009-10-18=714, 2009-11-29=31779, 2009-03-01=34, 2009-02-08=32, 2010-02-21=130451, 2009-11-22=9085, 2009-06-14=69, 2009-03-08=41, 2009-02-01=25}
  • 10. Basic Elements of a Prospero Operation
    • One or more Patterns -- selects records in the index
    • Specification of what to retrieve from the selected records
      • Record counts
      • Histograms (i.e. count distinct values) from fields
      • Whole records or structures
      • Ad-hoc list of fields
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 11. | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. The ConsumerBase Data Model
  • 12. | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. Document Sentence Frame (aka Insight) Role Records of the ConsumerBase Data Model Information about the original source document: a web page, product review or tweet Every individual sentence from each document. Individual relationships discovered in a sentence by NetBase's Natural Language Processing (NLP) technology Within a single relationship, a single "node" or "end" of the relationship; an individual participant Count: about 1 Billion Count: about 12 Billion Count: about 2 Billion Count: about 6 Billion
  • 13. | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. Sentence, Frame and Role "Jack prefers the IPod over Blackberry." Frame Type Role Name Role Value Preference PreferredObject IPod AlternativeObject Blackberry Positives Agent Jack Sentiment Prefer Object IPod
  • 14. Data Model: Document | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. Name St Ix Type Meaning fingerprint St Ix fingerprint Identity of the Document datetime St Ix datetime Publication date url St URL Where a browser user can find the document text Ix stemmed Searchable text of the document namedEntities Ix stemmed Named Entities found in the document title St Ix orig/stemmed Title of the document source St Ix ENUM BLOG, FORUM, etc. domain St Ix ? Domain section of the URL (e.g. twitter.com) insights Ix Multiterm Enum A list of all the insights in all the Roles from the document sentiments Ix Multiterm Enum A list of all the sentiments in all the Roles from the document insights Ix Multiterm Enum A list of all the Insight types from the document roles Ix Multiterm Enum A list of all the Roles from the document
  • 15. Data Model: Sentence | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. Name St URL Where a browser user can find the document text St Ix orig/stemmed Searchable text of the document background Ix stemmed Searchable text of surrounding sentence namedEntities Ix Multiterm Orig Named Entities found in the sentence sentNo St int Number of the sentence within its document insights Ix Multiterm Enum A list of all the concepts in all the Roles from the sentence sentiments Ix Multiterm Enum A list of all the sentiments in all the Roles from the sentence insights Ix Multiterm Enum A list of all the Insight types from the sentence roles Ix Multiterm Enum A list of all the Roles from the sentence
  • 16. Data Model: Insight and Role | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. Role Frame Name St Ix Type Meaning precision St Ix ENUM Precision of the insight, e.g. HIGH type St Ix ENUM Insight Type, likely a Frame name, e.g. Preference url St URL Where a browser user can find the document name St Ix ENUM Role name, e.g. PreferredObject sentiment St Ix ENUM Positive, Negative, Neutral insight St Ix ENUM Person, Place, Product, Emotion, etc. result St Ix orig/stemmed Text value of the role namedEntities Ix Multiterm orig Named Entities found in the result position St Int:int Offset/length in the sentence string
  • 17. | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. Prospero Pattern Language
  • 18. | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide. Patterns Specify Index Records "match"
    • Nested collections of text-matchers, combiners and iterators
    • Text version for human composition
    • Java Objects for programs to use
    • Parser and toString() convert between the two
    • Pattern doesn't specify what to do with the matched records, only chooses some records and not others
    • Like the "where clause" in SQL
    • Mostly, any single pattern matches records of one class
      • I.e. if you search for all records where
      • DocumentObj.text='listerine'
      • the records you get back will all be DocumentObj 's
  • 19. Simple Term Patterns
    • Term
      • DocumentObj.text='photoshop'
      • Makes an exact match on a single word.
    • Prefix
      • DocumentObj.text='photosh*'
      • Matches any word that has this prefix. "*" must be the last character.
    • Phrase
      • DocumentObj.text='high fructose corn syrup'
      • Matches a sequence of words (any of the interior words may be wild-carded as "*")
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 20. Boolean Combination Patterns
    • AND (all subordinates must match)
      • AND(DocumentObj.text='iphone', DocumentObj.text='battery')
    • AND NOT (one or more subordinates is negative)
      • AND(DocumentObj.text='tide',
      • NOT(DocumentObj.text='crimson'))
      • NOT is only valid as a subordinate of AND
    • OR (at least one of the subordinates must match)
        • OR(DocumentObj.text='iphone', DocumentObj.text='ipad')
      • Subordinates may be complex Patterns
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 21. Date Range Pattern
    • DateRange(DocumentObj.datetime,
      • '2009-05-03', 2009-06-09')
    • Range is computed inclusive of the start date, and exclusive of the end date
      • The example above will match any value of DocumentObj.datetime from 2009-05-03 00:00:00.000 to 2009-06-08 23:59:59.999 GMT
    • Dates in the index, and in these patterns, are in GMT time zone.
    • Dates in the pattern can not (currently) include times
    • The matching is more efficient if the start and end dates are on the first day of the month
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 22. Iterator Patterns
    • ChoiceIterator(
      • POS: RoleObj.sentiment='Positive',
      • NEG: RoleObj.sentiment='Negative',
      • NEU: RoleObj.sentiment='Neutral'
    • )
    • Represent more than one pattern
    • Prospero expands them at the worker level
    • Produce more than one count result, keyed for recognition by the user
    • May be combined to generate permutations
    • (Probably) usable only in COUNT operations
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 23. ChoiceIterator Pattern
    • ChoiceIterator(
      • POS: RoleObj.sentiment='Positive',
      • NEG: RoleObj.sentiment='Negative',
      • NEU: RoleObj.sentiment='Neutral'
    • )
    • Matches each subordinate pattern, in succession
    • Subordinate Patterns may be simple or complex
    • Keys (e.g. POS:) come back in the results to aid the application in applying them
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 24. DateSeriesIterator Pattern
    • DateSeriesIterator(
    • '2009-05-22', 7, 15, 'yy-MM')
      • Becomes 15 sequential 7-day DateRangePatterns.
    • DateSeriesIterator(
    • '2009-03-01', 0, 10, 'yy-MM')
      • Becomes 10 sequential one-month DateRangePatterns
    • The result key is the start date of each period, formatted in accordance with the last argument.
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 25. Multi-Dimensional Iterator Patterns
    • AND (
    • DocumentObj.text='photoshop',
    • ChoiceIterator(
    • POS: DocumentObj.sentiments='Positive',
    • NEG: DocumentObj.sentiments='Negative',
    • NEU: DocumentObj.sentiments='Neutral'
    • ),
    • DateSeriesIterator(
    • DocumentObj.datetime,'2009-04-01', 0, 10)
    • )
    • Produces a 2-dimensional matrix of results, iterated over the 3 values of DocumentObj.sentiments, and the 10 one-month time buckets (30 values in all
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.
  • 26. Hierarchy Transformation
    • To(DocumentObj,
    • RoleObj.insight='Person'
    • ) This matches all Documents that generated an Insight with a Person role.
    • To(SentenceObj,
    • DocumentObj.source='Blog'
    • ) This matches all sentences that are part of a document that came from a Blog.
    • Traverses the Data Model Hierarchy
    • Transforms a result of one class to its parent/children
    | Confidential | © 2010 NetBase Solutions. All Rights Reserved Worldwide.

×