The document proposes a Semantic PaaS (SPaaS) cloud platform to extract and process text information using natural language. It would address the challenges of making sense of vast amounts of online content and linking related information. The SPaaS architecture would include services for enriching document metadata, integrating existing data for analytics, publishing semantic data sets, and identifying cross-language links. This innovation could help users more effectively analyze unstructured information and make informed decisions.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Presentation for turkot v2 0 (dh)
1. PRESENTATION
Addendum to the Grant Application from
Innovation project: Cloud platform for
development and procurement of semantic
services (Semantic PaaS, SPaaS), making
possible to extract and process text information
using natural language.
Company name:
Avicomp Services, LLC
Moscow
2. 1. Innovation project’s resume ( called further, Project )
Current market issues that Project is suppose to How does the Project solve the problem
address Avicomp Services has been involved in semantic field
for over 10 years. One of the key achievements of the
The challenge therefore remains on how to create company in this area has been development of
meaning to the content and how to link relevant content powerful linguistic vehicle that is based on in-depth
together research in semantic area and allows automatically
produce ―semantic-aware & ready‖ content in the
the amount of Web pages totals to more than 50 Internet and build new semantic services that in turn
Million (Google) make non-structured information usage esay and
flexible in the following ways:
Avalanche-like growth of the documents: in 2002
Formation of set of services, when users can enrich
large enterprises used to process up to 18 000 meta-information (semantic data) of their documents,
documents per year, in 2003 that amount doubled, in published thru the Web, or at the corporate archives.
2004 large enterprises used to handle about 46 000 Extra meta-information? Attached to the document,
allows improve search accuracy and quality, information
documents on average, in 2008 amount of corporate categorization and combination.
documents grew to 80 000, and in 2011 — exceeded Formation of set of services, when users can use extra
400 000 documents (Forrester Research ) meta-information to integrate with existing information
while performing BI/OLAP analysis.
As of today total number of internet users exceeded
Formation of set of services when users can publish in
2 bln., and there is an estimate that total amount of semantic archive own sets of semantic data and get them
data is over 1 800 exabyte (1 exabute = 1018). linked with existing (e.g. Web) sets of Open Linked Data
(LOD).
Formation of set of services to identify and link semantic
data sets using different languages.
Formation of set of services when users can create their
Today’s users are not capable to start analyzing non- own applications using established archive of semantic
data.
structured information in the Internet , not to mention to
take weighted decisions based on such analysis. User gets Mentioned above and other services will become
swamped at the stage of information gathering available from the single software platform - Semantic
PaaS (SPaaS), that is based on the technology with
strong fundament of semantic and morphologic rules.
2
3. 2. The current market situation in search
The problem of any search
systems
The Google and other search system based on
keywords and matching concepts produce no
results at all
3
4. 3. Target market
Landscape of Semantic Applications Market estimate (volume)
250
1. Today market is more then $100 bln
2. Impact of semantic technology
- 20-80% less labour hours
- 20-75% less operating cost
- 30-60% less inventory level
- 20-85% less development cost
Source: TopQuadrant
5. 4. Competition (Extract)
Comparative analysis
Analogues Stage Price, $ Parameter 1 Parameter 2 Parameter 3
(market / (NLP) (RDF Store) (Apps/Service)
development)
OntoText Production License model. Based on GATE OWL Store Search, Sort
Price from 50 K to
250 K€
OpenCalais Production Free and Pure NLP Service. No store Limited set of
subscription (price mash-up
not known)
GATE Research and API Small subscription NLP as open No store No services
Service fee source or via API.
Ontoprise Production License model and Only TextMining No RDF store. Only Various specific
consulting service without Information RDBMS for Indexes Apps for ontology
price starts at 100 Extraction engineering and
K€ modelling
Analogues Functional Area Stage
PowerSet NLP Engine Bought by Microsoft
FAST Text Mining Bought by Microsoft
Freebase RDF Knowledge Base in the LOD Bought by Google
6. 5. Market segments where product is focused on
Business model
1. B2B
• Goverment – Use SPaaS to build the Linked Open Data within Governments (licenses &
deployment consulting)
• Large Enterprises – development of the instrument to extract knowledge (licenses &
deployment consulting)
• Small business – instrument to produce semantic content (SaaS)
2. B2C
• To satisfy information search needs of individual users (including mobile applications)
Potential Project product users (Russian market only as it will serve as a test-bed to fine-
tune the business model)
Russian Accounting Chamber
Russian Ministry of Education
RIA News
Moscow City Government
Rusnano
President’s Administration
6
At the moment all these prospects have been engaged with the conversation about their needs
7. 6. Technology of the Project –Semantic PaaS architecture
High level view of the SPaaS architecture
Harvesting and Crawling with a heuristic approach that is able to
integrating the ecosystem of integrate various sources (not only RSS Feeds) and a planarization
method which automatically extracts the plain text from a Web page.
complementors and their customers
NLP Service that is based on a multi-agent and multilingual
architecture allowing to scale. Further the service will incorporate an
ontology rule based approach for information extraction (IE) enriched
with statistical methods and a method that can use existing
background knowledge for example in the Linked Open Data (LOD)
cloud or inside Web pages (E.g. RDFa, schema.org or HTML5
metadata).
Knowledge Generation Process mainly for the handling of unique
object identification and merging, ontology alignment, data authoring
and interlinking.
Scalable RDF store for storing the extracted knowledge as semantic
graphs using the latest technology and methods for handling RDF
triples. The store will also include a plain SPARQL interface as well a
layer for an intelligent and easy to use access (Data Access API). With
the expected growth of digital data the RDF store architecture will also
include other database storing mechanisms in order to solve the
problem of ―Big Data‖.
Application Services compromises modules to manage the RDF life
cycle, various interfaces to search, retrieve and store data as well as
core functions related to analytical functions (OLAP for RDF) and
prediction modelling based on algorithmic game theory. Part of this
stack will be also a set of core modules that will support demands from 7
external applications.
8. 7. Use Case – Linked Government Data (LGD)
Our SPaaS Offer for 5 star:
• Pipeline/WF to
create RDF (LOD)
• Government vocabulary
(Ontology)
• Scalable RDF (LOD) store
• UID or controlled named
entity name server
Enable Application and Eco-
System for e-Citizen Later adapt LGD to
9/19/2012
Linked Enterprise Data 8
9. 8. Use Case – Online News
Triple store for Our SPaaS for Online News:
Architecture entities (SPaaS) • Pipeline/WF for tagging
(simplified) OntoDix
Learning corpora (SPaaS) and NE extraction
Topic & Entity (topics) • RDFa/Microformat
Extraction (SPaaS) injection to web pages
Topics
Tagging System API Manager • Scalable RDF store
• Knowledge Engineering
metadata
CMS
sync
(Semantic Platform)
RESTful API
CMS
nginx Delivery Server
+ (nodeJS, Fugue, SocketIO,
HDB HDB
Apache RabbitMQ) + Routes DB
(Mongo) Desktop
(Sencha)
External
user/app
9/19/2012 9
10. 9. Intellectual property
Existing patents
Patent for an invention № 2242048 «Method of automated processing of text –based information
materials». Owner «Ontos AG (Switzerland)».
Patent for an invention №2399959 «Method of automated processing of text using natural language
by semantic indexing, method of processing of text collection using natural language by semantic
indexing and machine-readable media». Owner «Ontos AG (Sw)».
Computer software certificate of registration №2006610704 «OntosMiner. Russian version». Owner
«Avicomp Services»
Computer software certificate of registration №2008613021 «Ontos RDF Store Server. Russian
version». Owner «Avicomp Services»
Computer software certificate of registration №2009611560 «Ontos SOA Server. Russian version».
Owner «Avicomp Services»
Computer software certificate of registration №2009611559 «Ontos AS Processing Server. Russian
version». Owner «Avicomp Services»
Computer software certificate of registration №2009611558 «Ontos AS Delivery Server. Russian
version». Owner «Avicomp Services»
Computer software certificate of registration №2009611557 «Ontology Dictionary. Russian version».
Owner «Avicomp Services»
19.09.2012 10
11. 10. Project’s Team (1)
Brief summary of key team members
Victor Klintsov Director of Russian W3C office
Shareholder & General Director Took part in the following projects: Public LOD
More than 20+ years of experience in IT industry resource in the field of science and
Chief ideologist and chief architect technology, integrated into the international LOD
space of knowledge, Analytical search and processing
Graduated in 1977г., from Moscow Chemical
system of letters sent by citizens to the President of
Engineering Institute
Russian Federation using semantic and linguistic
Author of numerous papers methods of information extraction and etc.
Daniel Hladky Author of numerous papers , invited expert e.g. EU
COO – Chief Operation Officer FP7, ISWC, Triplify-Challenge
More than 20+ in the IT including SAP, iXOS Speaker at conferences such as SemTech, ESTC, I-
(OpenText) Semantics
Responsible for regional development, marketing and
sales and operations.
Holds a MBA from Strathclyde University.
Dr Sören Auer Leader of the research group AKSW at University
CRO – Chief Research Officer Leipzig.
Researcher and Professor since 2003. Coordinator Author of numerous papers , invited expert e.g. EU co-
of various EU FPx projects. organiser of several workshops, programme chair of I-
Semantics 2008, OKCON 2010, ESWC 2010 and
Responsible for research and innovation. ICWE 2011, WWW2012, area editor of the Semantic
Studied Mathematics and Computer Science at Web Journal, serves as an expert for industry, the
University Dresden, Hagen and Yekaterinburg European Commission, the W3C and is member of the
(Russia). PhD at University Leipzig. advisory board of the Open Knowledge Foundation.
19.09.2012 11
12. 11. Project’s Team (2)
Brief summary of key team members
Grigory Drobyazko processing
CTO – Chief Technology Officer Took part in the following projects: Public LOD
More than 20+ in the IT including RDBMS and resource in the field of science and
custom development technology, integrated into the international LOD
Responsible for R&D including architecture space of knowledge, Analytical search and
design, UI design and software support. processing system of letters sent by citizens to the
Co-author of scientific papers on solutions for President of Russian Federation using semantic and
semantic web and technologies of data extraction linguistic methods of information extraction and etc.
and information resources text analysis for analytical
• Analysts - 10 • Programmers-
persons Developers of
• Linguists Linguistic
Developers - 9 Software - 7
persons persons
• Programmers-
Developers - 15
persons
19.09.2012 12
13. 12. The current status
The key steps
Non-stop platform development for more then 10 years
Built initial platform «Alfa» of Semantic PaaS
• Develop NLP module for media
Current platform is based on the experience made with several
customer projects and with research projects (see the table below) • Develop a portal
Done of proof of concept of taggig, aggreg., news visualisation • Research and create linguistic rule
Experience from law enforcement, media and portals
Past and current financing
Shareholders supported development • Develop a concept for IKB
Execution of research and development activities • Develop a concept for RDF storage
Sales proceeds 2010 (fact) 2011 (fact) 2012-2013 (plan)
(R&D work)
Total 46,1 mln RUB 46,6 mln. RUB. 60+ mln. RUB.
Minister of Education 21,7 mln. RUB 20+ mln. RUB.
RIA Novosti 46,1 mln. RUB 8,9 mln.RUB 40+ mln. RUB
Others 16,0
19.09.2012 13
14. 13. Project’s co-investor
Fund raising plan
Current phase fund raising
Co-investor 1
Ministry of Education of Russian Federation – up to 90 mln RUB
Co-investment – signed contract to perform R&D
Co-investor 2
VEB Innovation Fund – up to 90 mln. RUB
Co-investment – equity debt type of financing
Exit for VEB Innovation Fund – sale to the strategic investor or MBO at agreed rate
Follow on fund raising
Stage name Expected Grant Expected investment Timing
financing from co-investor
Core platform 90 mln RUB 90 mln. RUB. 2012-2013
development
Development of semantic 20 mln. RUB 60 mln. RUB 2013-2014
services
Start selling platform and 20 mln.RUB 2014-2015
services
19.09.2012 14
15. 14. Project development plan
2012 2013 2014 2015
Enhance the LyfeCycle Work on Big Data Cloud Platform
NLP system Management of analytics and optimization
(WP1) Data and Predictive NLP for Asian
Large Scale Knowledge (WP4 Analysis (WP7) languages
Data and 5) Develop eCitizen
Management Enrichment Service
(WP2) and the (WP3) Applications as 20 mln RUB.
deployment of Have use cases showcases.
the solution to ready for
the cloud eGov, Oil & Gas
Access to the (WP6) 80 mln RUB.
system via SQL Performance
Lite and optimization and
SPARQL scalability.
180 mln RUB.
19.09.2012 15