Your SlideShare is downloading. ×
WebExpo 2008 Newstin
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

WebExpo 2008 Newstin

799
views

Published on

Presentation about Newstin at conference WebExpo 2008 about categorization of text content on web in real time. …

Presentation about Newstin at conference WebExpo 2008 about categorization of text content on web in real time.

More at http://2008.webexpo.cz/prednaska/kategorizace-weboveho-obsahu-v-realnem-case/

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
799
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Newstin Real-time Web Content Categorization Presentation to WebExpo 2008 October 18, 2008
  • 2. Company Background  Newstin a.s. founded in 1998 as I2S in Prague  Team of 30 employees  26 engineers  14 nations  Since 2005  Real-time semantic content categorization  Multiple patent filings on cross-language solution  Past activities  Business & government projects in information management and security  Partnership with Business Objects/SAP  RedHerring Europe 100 Winner Award
  • 3. What is Newstin?  Patented technology  Largest news database, catalog of news in the world  150,000+ information sources in 11 languages  250,000+ articles daily fully processed into 1,000,000+ categories  US, UK, Indian, French, German, Italian, Spanish, Mexican, Portuguese, Brazilian, Czech, Russian, Arabic, Chinese  Japanese, Korean, Turkish coming in Q4 2008  Newstin.com  Popular user applications  Business Intelligence  Enterprise content organization
  • 4. What is Newstin? (Details)  Newstin is an innovative technology that incorporates a completely new approach to content organization. Newstin technology and its service-oriented architecture is the foundation of a unique system that features fully scalable real-time semantic, multi-language and cross-language document categorization. Newstin patented technology has the potential to become the core platform for organizing any unstructured textual data, including data from all sources on the Internet and potentially including the hidden Web.  Newstin is a powerful engine which harnesses a variety of cutting-edge technologies and implements linguistic processing with semantic analysis, multilevel content categorization and cross-language taxonomy structures. The applications of Newstin technology utilize an inherent capability to make use of context in addition to conventional key word approaches.  Newstin is the largest news database/catalogue in the world currently comprising 40 Million documents & 2.2 Billion metadata items and constantly growing. Newstin article collection is continuously updated from over 160,000 global and weighted sources selected from a pool of over 3 Million preprocessed sources in 12 languages. Daily up to 200,000+ articles are fully processed into 1.1 Million categories in 15 supported editions: US, UK, Indian, French, German, Italian, Spanish, Mexican, Portuguese, Brazilian, Czech, Russian, Arabic, Chinese and Korean; with more languages and editions coming soon.  Newstin is a complex system incorporating content retrieval, metadata processing, analysis and visualization. The extensive operation behind Newstin makes it a perfect platform for SaaS solutions.  Newstin is a bi-directional application of its own. By imposing order on unstructured data Newstin leverages its own extensive metadata collection for business intelligence and enterprise performance management. It is inevitable to organize content first to maximize knowledge mining capability.
  • 5. Web Content Chaos  An inspiration for Newstin to develop a solution for organizing web content
  • 6. Semantic Web 2.0 Organization  A portion of Newstin’s taxonomy structure – a step toward organizing web content
  • 7. Live Demonstration – Newstin.com
  • 8. Live Demonstration – NewstinMap
  • 9. Live Demonstration - Connecting VIP
  • 10. Live Demonstration – BI Example
  • 11. Live Demonstration – BI Example
  • 12. Live Demonstration - EmergingStories
  • 13. B2B: Online Categorization Firewall Enterprise Intranet Unstructured Semantic  Data Newstin Organization Contextual Search Categorization  Visual Navigation  Metadata Engine Cross-language  Mash up  internal/external Semantic / Web 2.0 Capability SaaS to Enterprise Market Standard for Tagging  Product synergy / enhancement  Competitive advantage
  • 14. Cross-language Information Retrieval  Newstin enables to reach a particluar topic in all supported languages through original definitions
  • 15. Life Cycle  Newstin is a comprehensive information system
  • 16. Shrnutí Prezentace - CZ Hlavní téma: Kategorizace webového obsahu v reálném čase Newstin a.s. je česká technologická firma se sídlem v Praze, zaměstnávající 30 inženýrů z 15 zemí. Během 3,5 roku vytvořila unikátní technologii na real-time organizování textových dokumentů s využitím sémantických a lingvistických technologií. Stěžejní a patentovanou součástí Newstin technologie je tzv. cross-lingvální řešení umožňující propojovat internetový obsah v různých jazycích bez použití překladů. Newstin vytvořil největší aktuální databázi článků internetového zpravodajství v 11 světových jazycích včetně češtiny, která obsahuje 37 milionů článků za posledních 9 měsíců a 2 miliardy metadat. V současnosti servery Newstin denně zpracují 250 tis. unikátních článků ze 160 tis. nejdůležitějších zdrojů po celém světě. Další využití technologie Newstin leží v oblasti mediálních analýz a organizaci podnikových dat.
  • 17. Real-time Web Content Categorization Thank you. Julius Rusnak CTO Newstin a.s. Lomnickeho 9 140 00 Prague Czech Republic