Personal data mining/indexing system featuring semantic information analysis and fuzzy logic.
PWA is expected to wrap up complicated data processing algorithms into simple and easy to use form.
It is aimed to boost user comfort and efficiency then dealing w/ personal computer systems.
It might be deployed with IVI, tablets, smartphones, desktop computers.
This was presented at Automotive Linux Summit 2012.
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Personal World Absorber - a tool to filter information garbage and boost user comfort.
1. Personal World
Absorber
Global Search + Indexing conception
Vladimir Kryukov, Product Marketing Director
Vladimir Vodolazkiy, Ph.D., Soft ware Architect
(C) ROSA - September, 2012
1
2. PWA Mission:
Information extraction from the information feeds
- textual, video, photo resources and so on.
Absorb only information, relevant to user’s interests
Store all extracted data in the personal user-friendly
information warehouse
Search based on the object’s conception
Comfortable object‘ representation/detalisation
Integration with 3rd party data processing
soft ware
2 (C) ROSA 2012
3. Personal infosphere
Continuous data bank, common for all user’s
gadgets, available at home, work, and car on
the road
Facts and objects are grouped in accordance
with personal preferences
All relation bet ween objects are subject of time,
space, personal alignment and inspiration of
the moment
3 (C) ROSA 2012
4. Main tasks to be solved for
PWA prototype
Information’ aggregation
Indexing
Search engine
Representation engine
Personal cloud storage
4 (C) ROSA 2012
5. Objects in PWA:
Texts - facts, fiction, notes, memoires
Music - inspiration, relaxation
Speech notes - memoires, notes, evidences
Images - facts, fiction, notes, memoires
Video - all above mentioned
Data - calendars, list, other info
5 (C) ROSA 2012
6. PWA data agregator subsystem
Robots (search agents)
Actualisation Search results
Data Warehouse
Analyzer
Request
Interpreter
Responce
6 (C) ROSA 2012
7. Cluster model in PWA
Latitude, Longtitude,
GPS- Timestamp,Direction
06/15 - Whiskas Chicken 3 pcs necklace Speed
Cat food
Objects can be purchase
Event list
united into clusters
Cat’s story
Cat’s story
My cat Cat’s story
Clusters can be Vaccination
united into calendar
metaclusters
and so on Photo Photo Photo
Movie
My cat on u-tube Cat on the
shelf
(C) ROSA 2012 7
8. Object’ attributes:
- Date of creation
- Time interval (intervals) covered
Time attributes
Tags with
Space attributes
description - Where it was created
- Space position (positions) covered
- list of the tags with the textual
- Movement alignment
description of object
- speed,
- direction
Technical
attributes Personal attitude
- Fact of the objective reality?
- file size - codec to play, - Positive, negative relation
- underlaying file/ - charset coding
- relax, inspiration, information,
database access - etc.
path 8 memoire (C) ROSA 2012
9. Object’ attributes:
Should be applicable to any kind of media
Should provide fast and reliable identification
Should be presented in human-readable form
Should take into account the uncertainty of the choice
Should be used in conjunction with personal user’s profile
Frame of the text-based tags with fuzzy weights
Conceptual thesaurus, based on WordNet
Adaptive user’ interests profile
9 (C) ROSA 2012
10. Tag set extraction
Document Keywords/Nouns
Common attribute extraction
Semantic meaning extraction
Formal/technical attributes Intermediate Terms
Resulted document description -
vector in the Base Term’s space
Base Terms
10 (C) ROSA 2012
11. Multiclassification
(C) ROSA 2012
Anna’s PWA
Each object can belong to
several clusters at once Where Kindergarden #5
When 26 Dec 2011 New Year Evening
Tag’s frame content depends on Duration 20:00
the owner of the object Who Mary, Helen
Ann - kindergartener
Example:: Two girls read poems at the festival in kindergarten...
Audio File MP3
Mary
Helen
Steve’s PWA John’s PWA
Where Kindergarden #5
Mary Where Kindergarden #5
When 26 Dec 2011
Helen When 26 Dec 2011
00:00 - 12:30
Timing
15:40 - 20:00 Timing 12:30 - 15:40
Who Mary Who Helen
Steve - father of Mary 11 John - father of Helen
12. Hierarchy approach
PWA includes different kinds of objects
PWA should provide estimation of similarity bet ween
different object
Similar objects can be combined in clusters
Any object can belong to several clusters
Clusters can be united the same way as underlying
objects
Hierarchy of the objects/clusters should be self-
organised
12 (C) ROSA 2012
13. Logical architecture fo indexing subsystem
Representation Appearance, set of Urgent message
parameters tuning content, time schedule generation criteria
End users
Informational channels (feeds)
Warehouse for information Data structures for task- Knowledge engineer
extraction algorithms oriented processing
Common information extraction
algorithms
Intermidiate processing layer, which converts
primary databases into user’s adopted views
Thesaurus Software engineer
«Manual» data
Information gathering agents Primary database input
Sensors,
OCR, Interface to Software and OS
System soft ware, servers NNTP, Video,
WWW, SMTP, SQL etc. Keyboard external world
13 (C) ROSA 2012
14. Empirical-based learning system
Initial learning Input text data
test corpse
End-User System
Knowlege-base Natural Language
Knowledge-base updater Processor
generation
Actual
knowledge base
Initial
knowledge base Processed data
14 (C) ROSA 2012
15. Information extraction approach
4Apr Dallas - Early last evening, Early/adv last/adj evening/
a tornado swept through an area noun/time, a/det tornado/
Early last evening adverb:time
northwest of Dallas, causing noun/weather swept/verb ...
a tornado noun group/subject
extensive damage...
swept verb group
through an area prep phrase:location
northwest of Dallas adverb:location
Syntax framing Sentence causing verb group
and markup analysis extensive damage noun group/object
Early last evening, a tornado
swept through an area northwest of
Event: Tornado Dallas...
Date: 04/03/1997 Whitnes confirmed, that the
Time: 19:15 twister... Extraction
Template Sub phrase Data extracted
Object merging
creation tornado swept event:tornado
swept through an area location: «area»
area northwest of location: «northwest of
Dallas Dallas»
causing extensive
effect: «damage»
damage
(C) ROSA 2012 15
16. Search in the tag space
I am interested in Apple soft ware, but
does Apple is interested in mine?
Asymmetric quasimetric to reflect
personal preferences and relationships
to objects
Cosinus measure and Tanimoto-based
equation to evaluate similarity of t wo
and more objects
16 (C) ROSA 2012
17. Representation
Various tasks requires different representation
Both visual and audio output is required
Search
results
Time Frame Infosphere
Document/file access in time Cluster-based view of objects
retrospective order
Herald and relationships
Audio output for the
search results summary
17 (C) ROSA 2012
18. Herald - text-to-speech
synthesis system
Pro:
Does not distract user from the main activity
Effectively utilizes user’s «input channells»
Contra:
Audio brings up to 10 times less information
than visual
Reuires special agregation procedures to be
developed
18 (C) ROSA 2012
19. Herald - based on Flite library
The same quiality as
Festival due to static Festival FestVox
voices Open source C++ Open source Voice
Framework
preparation
No Scheme, no interpreter Scheme based
internal scripting subsystem
Small, fast portable run-
time synthesizer
Fast synthesis start
Flite
Thread safe
Light-Weight Pure C version
Ideal for embedded GStreamer
with precompiled voices
systems for runtime library
GStreamer - not yet
19 (C) ROSA 2012
20. Summary
PWA technology development is in progress
today
ROSA is ready to collaborate with OEMs to
enable next generation devices with
intellectual data organisation and search
Developers are welcome to join ROSA team in
this project
Contacts: vladimir.kryukov@rosalab.ru,
vladimir.vodolazkiy@rosalab.ru
20 (C) ROSA 2012