Presentation for turkot v2 0 (dh)

PRESENTATION
Addendum to the Grant Application from

Innovation project: Cloud platform for
development and procurement of semantic
services (Semantic PaaS, SPaaS), making
possible to extract and process text information
using natural language.

Company name:
Avicomp Services, LLC

Moscow

1. Innovation project’s resume ( called further, Project )

Current market issues that Project is suppose to How does the Project solve the problem
address Avicomp Services has been involved in semantic field
for over 10 years. One of the key achievements of the
The challenge therefore remains on how to create company in this area has been development of
meaning to the content and how to link relevant content powerful linguistic vehicle that is based on in-depth
together research in semantic area and allows automatically
produce ―semantic-aware & ready‖ content in the
 the amount of Web pages totals to more than 50 Internet and build new semantic services that in turn
Million (Google) make non-structured information usage esay and
flexible in the following ways:
 Avalanche-like growth of the documents: in 2002
 Formation of set of services, when users can enrich
large enterprises used to process up to 18 000 meta-information (semantic data) of their documents,
documents per year, in 2003 that amount doubled, in published thru the Web, or at the corporate archives.
2004 large enterprises used to handle about 46 000 Extra meta-information? Attached to the document,
allows improve search accuracy and quality, information
documents on average, in 2008 amount of corporate categorization and combination.
documents grew to 80 000, and in 2011 — exceeded  Formation of set of services, when users can use extra
400 000 documents (Forrester Research ) meta-information to integrate with existing information
while performing BI/OLAP analysis.
 As of today total number of internet users exceeded
 Formation of set of services when users can publish in
2 bln., and there is an estimate that total amount of semantic archive own sets of semantic data and get them
data is over 1 800 exabyte (1 exabute = 1018). linked with existing (e.g. Web) sets of Open Linked Data
(LOD).
 Formation of set of services to identify and link semantic
data sets using different languages.
 Formation of set of services when users can create their
Today’s users are not capable to start analyzing non- own applications using established archive of semantic
data.
structured information in the Internet , not to mention to
take weighted decisions based on such analysis. User gets  Mentioned above and other services will become
swamped at the stage of information gathering available from the single software platform - Semantic
PaaS (SPaaS), that is based on the technology with
strong fundament of semantic and morphologic rules.
2

2. The current market situation in search

The problem of any search
systems

The Google and other search system based on
keywords and matching concepts produce no
results at all

3

3. Target market

Landscape of Semantic Applications Market estimate (volume)

250

1. Today market is more then $100 bln
2. Impact of semantic technology
- 20-80% less labour hours
- 20-75% less operating cost
- 30-60% less inventory level
- 20-85% less development cost
Source: TopQuadrant

4. Competition (Extract)

Comparative analysis
Analogues Stage Price, $ Parameter 1 Parameter 2 Parameter 3
(market / (NLP) (RDF Store) (Apps/Service)
development)

OntoText Production License model. Based on GATE OWL Store Search, Sort
Price from 50 K to
250 K€

OpenCalais Production Free and Pure NLP Service. No store Limited set of
subscription (price mash-up
not known)

GATE Research and API Small subscription NLP as open No store No services
Service fee source or via API.

Ontoprise Production License model and Only TextMining No RDF store. Only Various specific
consulting service without Information RDBMS for Indexes Apps for ontology
price starts at 100 Extraction engineering and
K€ modelling

Analogues Functional Area Stage

PowerSet NLP Engine Bought by Microsoft

FAST Text Mining Bought by Microsoft

Freebase RDF Knowledge Base in the LOD Bought by Google

5. Market segments where product is focused on

Business model
1. B2B
• Goverment – Use SPaaS to build the Linked Open Data within Governments (licenses &
deployment consulting)
• Large Enterprises – development of the instrument to extract knowledge (licenses &
deployment consulting)
• Small business – instrument to produce semantic content (SaaS)
2. B2C
• To satisfy information search needs of individual users (including mobile applications)

Potential Project product users (Russian market only as it will serve as a test-bed to fine-
tune the business model)
 Russian Accounting Chamber
 Russian Ministry of Education
 RIA News
 Moscow City Government
 Rusnano
 President’s Administration
6
At the moment all these prospects have been engaged with the conversation about their needs

6. Technology of the Project –Semantic PaaS architecture

High level view of the SPaaS architecture
Harvesting and Crawling with a heuristic approach that is able to
integrating the ecosystem of integrate various sources (not only RSS Feeds) and a planarization
method which automatically extracts the plain text from a Web page.
complementors and their customers
NLP Service that is based on a multi-agent and multilingual
architecture allowing to scale. Further the service will incorporate an
ontology rule based approach for information extraction (IE) enriched
with statistical methods and a method that can use existing
background knowledge for example in the Linked Open Data (LOD)
cloud or inside Web pages (E.g. RDFa, schema.org or HTML5
metadata).

Knowledge Generation Process mainly for the handling of unique
object identification and merging, ontology alignment, data authoring
and interlinking.

Scalable RDF store for storing the extracted knowledge as semantic
graphs using the latest technology and methods for handling RDF
triples. The store will also include a plain SPARQL interface as well a
layer for an intelligent and easy to use access (Data Access API). With
the expected growth of digital data the RDF store architecture will also
include other database storing mechanisms in order to solve the
problem of ―Big Data‖.

Application Services compromises modules to manage the RDF life
cycle, various interfaces to search, retrieve and store data as well as
core functions related to analytical functions (OLAP for RDF) and
prediction modelling based on algorithmic game theory. Part of this
stack will be also a set of core modules that will support demands from 7
external applications.

7. Use Case – Linked Government Data (LGD)

Our SPaaS Offer for 5 star:
• Pipeline/WF to
create RDF (LOD)
• Government vocabulary
(Ontology)
• Scalable RDF (LOD) store
• UID or controlled named
entity name server
Enable Application and Eco-
System for e-Citizen Later adapt LGD to
9/19/2012
Linked Enterprise Data 8

8. Use Case – Online News

Triple store for Our SPaaS for Online News:
Architecture entities (SPaaS) • Pipeline/WF for tagging
(simplified) OntoDix
Learning corpora (SPaaS) and NE extraction
Topic & Entity (topics) • RDFa/Microformat
Extraction (SPaaS) injection to web pages
Topics
Tagging System API Manager • Scalable RDF store
• Knowledge Engineering

metadata
CMS

sync
(Semantic Platform)
RESTful API

CMS
nginx Delivery Server
+ (nodeJS, Fugue, SocketIO,
HDB HDB
Apache RabbitMQ) + Routes DB
(Mongo) Desktop
(Sencha)
External
user/app

9/19/2012 9

9. Intellectual property

Existing patents
 Patent for an invention № 2242048 «Method of automated processing of text –based information
materials». Owner «Ontos AG (Switzerland)».
 Patent for an invention №2399959 «Method of automated processing of text using natural language
by semantic indexing, method of processing of text collection using natural language by semantic
indexing and machine-readable media». Owner «Ontos AG (Sw)».
 Computer software certificate of registration №2006610704 «OntosMiner. Russian version». Owner
«Avicomp Services»
 Computer software certificate of registration №2008613021 «Ontos RDF Store Server. Russian
version». Owner «Avicomp Services»
 Computer software certificate of registration №2009611560 «Ontos SOA Server. Russian version».
Owner «Avicomp Services»
 Computer software certificate of registration №2009611559 «Ontos AS Processing Server. Russian
 Computer software certificate of registration №2009611558 «Ontos AS Delivery Server. Russian
 Computer software certificate of registration №2009611557 «Ontology Dictionary. Russian version».
Owner «Avicomp Services»

19.09.2012 10

10. Project’s Team (1)

Brief summary of key team members

Victor Klintsov  Director of Russian W3C office
 Shareholder & General Director  Took part in the following projects: Public LOD
 More than 20+ years of experience in IT industry resource in the field of science and
 Chief ideologist and chief architect technology, integrated into the international LOD
space of knowledge, Analytical search and processing
 Graduated in 1977г., from Moscow Chemical
system of letters sent by citizens to the President of
Engineering Institute
Russian Federation using semantic and linguistic
 Author of numerous papers methods of information extraction and etc.

Daniel Hladky  Author of numerous papers , invited expert e.g. EU
 COO – Chief Operation Officer FP7, ISWC, Triplify-Challenge

 More than 20+ in the IT including SAP, iXOS  Speaker at conferences such as SemTech, ESTC, I-
(OpenText) Semantics

 Responsible for regional development, marketing and
sales and operations.
 Holds a MBA from Strathclyde University.

Dr Sören Auer  Leader of the research group AKSW at University
 CRO – Chief Research Officer Leipzig.
 Researcher and Professor since 2003. Coordinator  Author of numerous papers , invited expert e.g. EU co-
of various EU FPx projects. organiser of several workshops, programme chair of I-
Semantics 2008, OKCON 2010, ESWC 2010 and
 Responsible for research and innovation. ICWE 2011, WWW2012, area editor of the Semantic
 Studied Mathematics and Computer Science at Web Journal, serves as an expert for industry, the
University Dresden, Hagen and Yekaterinburg European Commission, the W3C and is member of the
(Russia). PhD at University Leipzig. advisory board of the Open Knowledge Foundation.
19.09.2012 11

11. Project’s Team (2)

Brief summary of key team members

Grigory Drobyazko processing
CTO – Chief Technology Officer Took part in the following projects: Public LOD
More than 20+ in the IT including RDBMS and resource in the field of science and
custom development technology, integrated into the international LOD
Responsible for R&D including architecture space of knowledge, Analytical search and
design, UI design and software support. processing system of letters sent by citizens to the
Co-author of scientific papers on solutions for President of Russian Federation using semantic and
semantic web and technologies of data extraction linguistic methods of information extraction and etc.
and information resources text analysis for analytical

• Analysts - 10 • Programmers-
persons Developers of
• Linguists Linguistic
Developers - 9 Software - 7
persons persons
• Programmers-
Developers - 15
persons

19.09.2012 12

12. The current status

The key steps
 Non-stop platform development for more then 10 years
 Built initial platform «Alfa» of Semantic PaaS
• Develop NLP module for media
 Current platform is based on the experience made with several
customer projects and with research projects (see the table below) • Develop a portal
 Done of proof of concept of taggig, aggreg., news visualisation • Research and create linguistic rule
 Experience from law enforcement, media and portals

Past and current financing
 Shareholders supported development • Develop a concept for IKB
 Execution of research and development activities • Develop a concept for RDF storage

Sales proceeds 2010 (fact) 2011 (fact) 2012-2013 (plan)
(R&D work)
Total 46,1 mln RUB 46,6 mln. RUB. 60+ mln. RUB.

Minister of Education 21,7 mln. RUB 20+ mln. RUB.

RIA Novosti 46,1 mln. RUB 8,9 mln.RUB 40+ mln. RUB

Others 16,0

19.09.2012 13

13. Project’s co-investor

Fund raising plan
Current phase fund raising
Co-investor 1
 Ministry of Education of Russian Federation – up to 90 mln RUB
 Co-investment – signed contract to perform R&D
Co-investor 2
 VEB Innovation Fund – up to 90 mln. RUB
 Co-investment – equity debt type of financing
 Exit for VEB Innovation Fund – sale to the strategic investor or MBO at agreed rate
Follow on fund raising

Stage name Expected Grant Expected investment Timing
financing from co-investor
Core platform 90 mln RUB 90 mln. RUB. 2012-2013
development
Development of semantic 20 mln. RUB 60 mln. RUB 2013-2014
services
Start selling platform and 20 mln.RUB 2014-2015
services

19.09.2012 14

14. Project development plan

2012 2013 2014 2015

 Enhance the  LyfeCycle  Work on Big Data  Cloud Platform
NLP system Management of analytics and optimization
(WP1) Data and Predictive  NLP for Asian
 Large Scale Knowledge (WP4 Analysis (WP7) languages
Data and 5)  Develop eCitizen
Management  Enrichment Service
(WP2) and the (WP3) Applications as 20 mln RUB.
deployment of  Have use cases showcases.
the solution to ready for
the cloud eGov, Oil & Gas
 Access to the (WP6) 80 mln RUB.
system via SQL  Performance
Lite and optimization and
SPARQL scalability.

180 mln RUB.

19.09.2012 15

Presentation for turkot v2 0 (dh)

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Presentation for turkot v2 0 (dh)

Similar to Presentation for turkot v2 0 (dh) (20)

Recently uploaded

Recently uploaded (20)

Presentation for turkot v2 0 (dh)