May 2013
From Big Data to Smart Data
Marin Dimitrov - CTO
About Ontotext
• Provides products and services for creating,
managing and exploiting semantic data
– Founded in 2000
– Offices in Bulgaria, USA and UK
• Major clients and industries
– Media & Publishing (BBC, Press Association, EuroMoney,
NDP Nieuwsmedia)
– HCLS (AstraZeneca, UCB, NIBIO)
– Cultural Heritage (The British Museum, The National
Archives, Polish National Museum, Dutch Public Library)
– Government (UK Parliament, United Nations FAO, LMI)
#2May 2013From Big Data to Smart Data (Semantic Days 2013)
Contents
• The Problem with Big Data for BI
• From Big Data to Smart Data
• Success Stories by Ontotext
#3From Big Data to Smart Data (Semantic Days 2013) May 2013
BIG DATA FOR BUSINESS
INTELLIGENCE
#4From Big Data to Smart Data (Semantic Days 2013) May 2013
The Problem with Big Data for BI
#5From Big Data to Smart Data (Semantic Days 2013) May 2013
The Problem with Big Data for BI
• It’s not only about Volume, Velocity & Variety
• Too much focus on processing speed & storage
volume
• “Brute force” approaches increase the amount of
data processed…
– But not necessarily the Value & insight derived from data
– May lead to even more data quality & inconsistency
problems
– Problems with data visualisation & exploration
– Often do not lead to better decision making
#6From Big Data to Smart Data (Semantic Days 2013) May 2013
The Problem with Big Data for BI
• BI success is not measured by Volume, Velocity &
Variety, but by more derived Value
• Organisations should learn how to better utilise their
“small data” before targeting Big Data
– Quality over quantity
– Better understanding of the data leads to better decision
making
– Avoid “needle in a haystack” situations
#7From Big Data to Smart Data (Semantic Days 2013) May 2013
The Problem with Big Data for BI
#8From Big Data to Smart Data (Semantic Days 2013) May 2013
Smart Data for Better BI
• Efficiently analyse unstructured data
– Most of the enterprise data is still unstructured
– Even within structured & transactional data sources there
is a lot of embedded unstructured data
– … and this unstructured data is poorly analysed (if at all) =>
lots of potential value still remains locked
– (sometimes even within semantic / Linked Data with
insufficient granularity)
#9From Big Data to Smart Data (Semantic Days 2013) May 2013
Smart Data for Better BI
• Focus on metadata first, Big Data later
– (As opposed to: Big Data first, metadata later)
• Enrich data
• Interlink data
• Provide a common metadata layer
– Break legacy silos
– Align heterogeneous metadata if necessary
• Better analysis of the data, better insight
#10From Big Data to Smart Data (Semantic Days 2013) May 2013
SUCCESS STORIES
#11From Big Data to Smart Data (Semantic Days 2013) May 2013
UK Job Market Intelligence
• Comprehensive recruitment database for the UK
– 4 million job ads / vacancies (dynamic)
– 220,000 company websites & 700 job boards monitored
• Questions we can answer
– What skills are in demand at present?
– Which are the top job boards in a region?
– Which is the right Job board for your industry sector?
– Which are the most active job advertisers / employers?
– Which are the agencies and employers that do not
advertise on your job board?
#12From Big Data to Smart Data (Semantic Days 2013) May 2013
UK Job Market Intelligence
#13From Big Data to Smart Data (Semantic Days 2013) May 2013
UK Job Market Intelligence
• Technology stack
– Web mining & focussed crawling
– KB construction from open & proprietary data sources
– Skills taxonomy (based on DISCO)
– Text mining & semantic enrichment
– Reconciliation & interlinking
– BI reporting & dashboards
#14From Big Data to Smart Data (Semantic Days 2013) May 2013
UK Job Market Intelligence
#15From Big Data to Smart Data (Semantic Days 2013) May 2013
UK Job Market Intelligence
#16From Big Data to Smart Data (Semantic Days 2013) May 2013
UK Job Market Intelligence
#17From Big Data to Smart Data (Semantic Days 2013) May 2013
Asset Recovery Intelligence System (ARIS)
• Support Financial Intelligence Units with tracking
stolen assets, fight corruption & money laundering
• Questions we can answer
– What are the reported activities related to a person?
– What is the person’s personal/professional network?
– What are corruptions cases reported in regional news?
• Data sources
– News feeds from major news agencies
– Dow Jones data & news feeds
– SARs to the FIU
– Open data (people & companies, Wikipedia)
#18From Big Data to Smart Data (Semantic Days 2013) May 2013
Asset Recovery Intelligence System (ARIS)
#19From Big Data to Smart Data (Semantic Days 2013) May 2013
Asset Recovery Intelligence System (ARIS)
• Technology stack
– Web Mining
– Text mining & semantic enrichment (KIM)
– ARIS ontology
• People, companies, assets, relations, financial transactions, …
– Reconciliation & Interlinking
– Triplestore (OWLIM)
– Semantic search & exploration UX
– BI reporting / factsheets / alerts
#20From Big Data to Smart Data (Semantic Days 2013) May 2013
Semantic Information Integration & Enrichment
#21From Big Data to Smart Data (Semantic Days 2013) May 2013
Q & A
Thank you!
@ontotext
#22From Big Data to Smart Data (Semantic Days 2013) May 2013

From Big Data to Smart Data

  • 1.
    May 2013 From BigData to Smart Data Marin Dimitrov - CTO
  • 2.
    About Ontotext • Providesproducts and services for creating, managing and exploiting semantic data – Founded in 2000 – Offices in Bulgaria, USA and UK • Major clients and industries – Media & Publishing (BBC, Press Association, EuroMoney, NDP Nieuwsmedia) – HCLS (AstraZeneca, UCB, NIBIO) – Cultural Heritage (The British Museum, The National Archives, Polish National Museum, Dutch Public Library) – Government (UK Parliament, United Nations FAO, LMI) #2May 2013From Big Data to Smart Data (Semantic Days 2013)
  • 3.
    Contents • The Problemwith Big Data for BI • From Big Data to Smart Data • Success Stories by Ontotext #3From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 4.
    BIG DATA FORBUSINESS INTELLIGENCE #4From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 5.
    The Problem withBig Data for BI #5From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 6.
    The Problem withBig Data for BI • It’s not only about Volume, Velocity & Variety • Too much focus on processing speed & storage volume • “Brute force” approaches increase the amount of data processed… – But not necessarily the Value & insight derived from data – May lead to even more data quality & inconsistency problems – Problems with data visualisation & exploration – Often do not lead to better decision making #6From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 7.
    The Problem withBig Data for BI • BI success is not measured by Volume, Velocity & Variety, but by more derived Value • Organisations should learn how to better utilise their “small data” before targeting Big Data – Quality over quantity – Better understanding of the data leads to better decision making – Avoid “needle in a haystack” situations #7From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 8.
    The Problem withBig Data for BI #8From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 9.
    Smart Data forBetter BI • Efficiently analyse unstructured data – Most of the enterprise data is still unstructured – Even within structured & transactional data sources there is a lot of embedded unstructured data – … and this unstructured data is poorly analysed (if at all) => lots of potential value still remains locked – (sometimes even within semantic / Linked Data with insufficient granularity) #9From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 10.
    Smart Data forBetter BI • Focus on metadata first, Big Data later – (As opposed to: Big Data first, metadata later) • Enrich data • Interlink data • Provide a common metadata layer – Break legacy silos – Align heterogeneous metadata if necessary • Better analysis of the data, better insight #10From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 11.
    SUCCESS STORIES #11From BigData to Smart Data (Semantic Days 2013) May 2013
  • 12.
    UK Job MarketIntelligence • Comprehensive recruitment database for the UK – 4 million job ads / vacancies (dynamic) – 220,000 company websites & 700 job boards monitored • Questions we can answer – What skills are in demand at present? – Which are the top job boards in a region? – Which is the right Job board for your industry sector? – Which are the most active job advertisers / employers? – Which are the agencies and employers that do not advertise on your job board? #12From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 13.
    UK Job MarketIntelligence #13From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 14.
    UK Job MarketIntelligence • Technology stack – Web mining & focussed crawling – KB construction from open & proprietary data sources – Skills taxonomy (based on DISCO) – Text mining & semantic enrichment – Reconciliation & interlinking – BI reporting & dashboards #14From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 15.
    UK Job MarketIntelligence #15From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 16.
    UK Job MarketIntelligence #16From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 17.
    UK Job MarketIntelligence #17From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 18.
    Asset Recovery IntelligenceSystem (ARIS) • Support Financial Intelligence Units with tracking stolen assets, fight corruption & money laundering • Questions we can answer – What are the reported activities related to a person? – What is the person’s personal/professional network? – What are corruptions cases reported in regional news? • Data sources – News feeds from major news agencies – Dow Jones data & news feeds – SARs to the FIU – Open data (people & companies, Wikipedia) #18From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 19.
    Asset Recovery IntelligenceSystem (ARIS) #19From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 20.
    Asset Recovery IntelligenceSystem (ARIS) • Technology stack – Web Mining – Text mining & semantic enrichment (KIM) – ARIS ontology • People, companies, assets, relations, financial transactions, … – Reconciliation & Interlinking – Triplestore (OWLIM) – Semantic search & exploration UX – BI reporting / factsheets / alerts #20From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 21.
    Semantic Information Integration& Enrichment #21From Big Data to Smart Data (Semantic Days 2013) May 2013
  • 22.
    Q & A Thankyou! @ontotext #22From Big Data to Smart Data (Semantic Days 2013) May 2013