Your SlideShare is downloading. ×

Text and Data Visualization Introduction 2012

853
views

Published on

Introduction to Text and Data Visualization for modelling Text Analytics applications. Incl. Who is Treparel / Why visualize data / How do we visualize data / multiple coupled views and interaction

Introduction to Text and Data Visualization for modelling Text Analytics applications. Incl. Who is Treparel / Why visualize data / How do we visualize data / multiple coupled views and interaction

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
853
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction to Text Visualization Dr. Anton Heijs CEO Treparel Delftechpark 26 2628 XH Delft July 2012The Netherlandswww.treparel.com
  • 2. KMX enables information and knowledge professionalsto gain faster, reliable, more precise insights in largecomplex unstructured data sets allowing them to makebetter informed decisions. Treparel is a leading technology solution provider in Big Data Text Analytics & VisualizationTreparel KMX – All rights reserved 2012
  • 3. Topics covered in this presentation • Who is Treparel? • Why visualize data? • How do we visualize data? • Multiple coupled views and interaction.Treparel KMX – All rights reserved 2012 www.treparel.com 3
  • 4. Nexus of Forces: Social, Cloud, Mobile, Information IT Market shift driving Big Data challenges Copyright: Gartner, 2011 80% of data is Unstructured (Documents, Text, Images, Graphs)Treparel KMX – All rights reserved 2012 www.treparel.com 4
  • 5. About Treparel • Delft, The Netherlands, 2006. • Treparel is an innovative technology solution provider in Big Data Analytics, Text Mining and Visualization. • KMX is an integrated data analysis toolset which provide faster, reliable intelligent insights in large complex unstructured data sets to allow companies to make better informed decisions. • Clients: Philips, Bayer, Abbott, European Patent Office, European Commission • Part of Research Centers and University ecosystem; TU Delft, Universities of Paris and Sao Paulo • More info: www.treparel.comTreparel KMX – All rights reserved 2012 www.treparel.com 5
  • 6. Positioning of Treparel’s KMX technologyText Acquisition & Preparation Analysis and processing Output and display‘Seek’ ‘Model’ ‘Adapt’External sources Reporting & Text preprocessingPatents PresentationLegal Media and publishingResearch Indexing databasesMedia / Publishers Content managementOther sources Clustering systemsDocumentsWebsites Line-of-business Classification applicationsBlogsNewsfeeds Research applicationsEmail Semantic AnalysisApplication notes Search enginesSearch resultsSocial networks Visualization Information extraction (entities, facts, relationships, concepts, patents) Management, Development and Configuration Treparel KMX – All rights reserved 2012 Copyright: Gartner, J. Popkin 2010
  • 7. Why visualize data? • Gives bird’s eye view to comprehensive and large datasets • Discover previously unknown patterns of the dataset • Reveals inherent problems of the data (noise, outliers) • Enables both the examination of the large scale features of the dataset as well as the local features (frame of reference) • Allows the user to form hypotheses on developed insights and make better informed decisionsTreparel KMX – All rights reserved 2012 www.treparel.com 7
  • 8. Why visualize data? Analyse without visualization • Initial selection of training documents is ‘blind’ • No insight on how the data is separated • Improving the Classifiertm a lengthy process • No clear insights in the performance of a ClassifiertmTreparel KMX – All rights reserved 2012 www.treparel.com 8
  • 9. Why visualize data? Analyse with visualization • Representive selection of training documents data by information professional • A clean training set is used (no noise/outliers) for building a Classifiertm • Visual feedback if the Classifiertm results in a separated section within the frame of reference of the whole dataset. • Insight into the performance of a ClassifiertmTreparel KMX – All rights reserved 2012 www.treparel.com 9
  • 10. How do we visualize data? • Create a representation we can use (vector space model) • Correct for (normalization): – Terms that appear often in most documents (non-discriminative terms) – Length of documents (remove bias towards longer documents)Treparel KMX – All rights reserved 2012 www.treparel.com 10
  • 11. 3 types of visualization 1. Visualize frequency distribution 2. Visualize classification scores 3. Visualize document similarityTreparel KMX – All rights reserved 2012 www.treparel.com 11
  • 12. 1. Visualize frequency distribution Requires carefull interpretationTreparel KMX – All rights reserved 2012 www.treparel.com 12
  • 13. 2. Visualize classification scores Parallel Coordinate PlotTreparel KMX – All rights reserved 2012 www.treparel.com 13
  • 14. 3. Visualize document similarity Also visualizes the Classification scoresTreparel KMX – All rights reserved 2012 www.treparel.com 14
  • 15. How do we visualize data? What is Least Squares Projection (LSP)? • LSP attempts to retain the neighborhood relations of objects in a high dimensional feature space when the data gets projected onto a two dimensional surface. Advantage Disadvantage Objects that are close to each other As a result of the projection to two (similar) in high dimensional feature dimensions (for human space are also close to each other in interpretation) information is lost. the LSP visualization. This can result in clusters that for This results in the formation of human interpretation are not clusters on the two dimensional condensed in an identifiable cluster surface. (e.g. scattered over the visualization).Treparel KMX – All rights reserved 2012 www.treparel.com 15
  • 16. 3. Visualize document classification scores After applying LSPTreparel KMX – All rights reserved 2012 www.treparel.com 16
  • 17. Multiple coupled views and interactionTreparel KMX – All rights reserved 2012 www.treparel.com 17
  • 18. Treparel is a leading technology solution provider in Big Data Text Analytics & Visualization Treparel Delftechpark 26 2628 XH Delft The Netherlands www.treparel.comTreparel KMX – All rights reserved 2012 www.treparel.com 18