Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BigData
Semantic Approach to
Big Data and Event Processing
Mastering	the	variety	dimension	of	Big	Data	
with	seman5c	techn...
BigData
Mastering	the	variety	dimension	of	Big	Data	with	
seman&c	technologies:	
high	level	intro	to	standards.	Focus	on	v...
BigData
Semantic Approach to
Big Data and Event Processing
What	are	the	most	
important	recent	
soUware/Internet		
success...
BigData
Semantic Approach to
Big Data and Event Processing
Apple’s	Siri	
IBM’s	Watson	
Google	Seman5c	Search	
What	are	com...
BigData
Semantic Approach to
Big Data and Event Processing
Just	stepping	back	a	bit
BigData
•  MicrosoU	purchased	Powerset	in	2008		
•  Apple	purchased	Siri	[Apr	2010]	
–  “Once	Again	The	Back	Story	Is	Abou...
BigData
•  RDFa	adop5on	….Search	engines	(esp	Bing)	
started	using	domain	models	and	(all)	use	of	
background	knowledge/st...
BigData
•  Seman5cs	with	metadata	and	ontologies	for	heterogeneous	
documents	and	mul5ple	repositories	of	data	including	t...
BigData
•  TBL	–	focus	on	data:	Data	Web	(“In	a	way,	the	Seman5c	
Web	is	a	bit	like	having	all	the	databases	out	there	as	...
BigData
1	
2	
3	
of	
Seman5c	Web
BigData
Semagix	Freedom	for	building		
ontology-driven	informa5on	system	
Managing Semantic Content on the Web
Extracting ...
BigData
1	
•  Ontology:	Agreement	with	a	common	vocabulary/
nomenclature,	conceptual	models	and	domain	
Knowledge	
•  Sche...
BigData
2	
•  Seman5c	Annota5on	(Metadata	Extrac5on):		
Associa5ng	meaning	with	data,	or	labeling	data	so	
it	is	more	mean...
BigData
Shallow	seman&cs	
Deep	seman&cs	
Changing	Focus	on	Interoperability	in	Informa5on	Systems:	From	System,	Syntax,	St...
BigData
SSN
Ontology
2 Interpreted data
(deductive)
[in OWL]
e.g., threshold
1 Annotated Data
[in RDF]
e.g., label
0 Raw D...
BigData
3	
•  Reasoning/Computa5on:	seman5cs	enabled	
search,	integra5on,	answering	complex	queries,	
connec5ons	and	analy...
BigData
•  Web	of	Linked	Data	
•  Introduced	by	Berners	Lee	
et.	al	as	next	step	for		
Web	of	Documents	
•  Allow	“machine...
BigData
•  Resource	Descrip5on	Framework	–	Recommended	by	
W3C	for	metadata	modeling	[RDF]	
•  A	standard	common	modeling	...
BigData
•  RDF	Triple	
o  Subject:	The	resource	that	the	triple	is	about	
o  Predicate:	The	property	of	the	subject	that	i...
BigData
•  Two	types	of	property	values	in	a	triple	
o  Web	resource	
o  Typed	literal	
IBM Armonk, New York,
United State...
BigData•  RDF	Schema:	Vocabulary	for	describing	groups	of	
resources	[RDFS]	
IBM Armonk, New York,
United States
Headquart...
BigData
•  Property	domain	(rdfs:domain)	and	range	
(rdfs:range)	 Headquarters located in
Company
Domain Range
Geographica...
BigData
•  Ontologies	are	shared	conceptualiza5ons	of	a	
domain	represented	in	a	formal	language*	
•  Ontologies:		
o Comm...
BigData
•  A	SPARQL	query	pacern	composed	of	triples	
•  Triples	correspond	to	RDF	triple	structure,	but	
have	variable	at...
BigData
•  An	example	query	pacern	
PREFIX	ex:<hcp://www.eecs600.case.edu/>	
SELECT	?company	?loca5on	WHERE	
{?company	ex:...
BigData
•  SELECT:	Returns	the	values	bound	to	the	
variables	
•  CONSTRUCT:	Returns	an	RDF	graph	
•  DESCRIBE:	Returns	a	...
BigData
Semantic Approach to
Big Data and Event Processing
a	licle	bit	about	
ontologies
BigData
Open	Biomedical	Ontologies	
http://bioportal.bioontology.org/ , http://obo.sourceforge.net/
Many Ontologies Availa...
BigData
Semantic Approach to
Big Data and Event Processing
From	simple	
ontologies
BigData
owl:thing
prescription
_drug_
brand_name
brandname_
undeclared
brandname_
composite
prescription
_drug
monograph
_...
BigData
Semantic Approach to
Big Data and Event Processing
to	complex	ontologies
BigData
GNT-I	
a<aches	GlcNAc	at	posi&on	2	
UDP-N-acetyl-D-glucosamine	+	alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2		
	<...
BigData
Semantic Approach to
Big Data and Event Processing
A	licle	bit	about	
seman5c	metadata	
extrac5ons	and	
annota5ons
BigData
WWW,	Enterprise	
Repositories
METADATA
EXTRACTORS
Digital	Maps
Nexis	
UPI	
AP	
Feeds/	
Documents
Digital	Audios
Da...
BigData
Semantic Annotation of Textual Data
BigData36
Seman5c	Annota5on	with	mul5ple	
ontologies	
Person
Company
Coordinates
Coordinate System
Time Units
Timezone
Spa...
BigData
830.9570 194.9604 2
580.2985 0.3592
688.3214 0.2526
779.4759 38.4939
784.3607 21.7736
1543.7476 1.3822
1544.7595 2...
BigData38
Popula'on	Level	
Personal	
Wheeze	–	Yes	
Do	you	have	*ghtness	of	chest?	–Yes	
ObservationsPhysical-Cyber-Social ...
BigData
Objects of Interest
“An object by itself is intensely uninteresting”.
Grady Booch, Object Oriented Design with
App...
BigData
•  Early	Seman5c	Search,	use	baby	steps	of	today’s	
engines	
•  Enterprise	applica5ons	–	healthcare	&	life	
scienc...
BigData
Focused
relevant
content
organized
by topic
(semantic
categorization)
Automatic Content
Aggregation
from multiple
...
BigDataOntology
Semantic Query
Server
1. Ontology Model Creation (Description) 2. Knowledge Agent Creation
3. Automatic ag...
BigData2004	SEMAGIX	
43
Watch	list	 Organiza5on	
Company	
Hamas		
WorldCom		
FBI	Watchlist		
Ahmed	Yaseer		
appears	on	Wat...
BigDataFraud	Preven5on	applica5on	used	in	
financial	services	–	Related	KYC	
applica5on	is	deployed	at	Majority	
of	Global	...
BigDataStructured	text	
(Scien&fic	
publica&ons	/	
white	papers)	
Experimental		
Results	
	
Clinical	Trial	Data	
Public	dom...
BigData
Semantic Approach to
Big Data and Event Processing
Thank	you!	
Any	Ques5on?
Upcoming SlideShare
Loading in …5
×

Mastering the variety dimension of Big Data with semantic technologies: high level intro to standards. Focus on variety/interoperability dimension.

296 views

Published on

Mastering the variety dimension of Big Data with semantic technologies: high level intro to standards. Focus on variety/interoperability dimension. Prof Amit Sheth

Published in: Data & Analytics
  • Be the first to comment

Mastering the variety dimension of Big Data with semantic technologies: high level intro to standards. Focus on variety/interoperability dimension.

  1. 1. BigData Semantic Approach to Big Data and Event Processing Mastering the variety dimension of Big Data with seman5c technologies: high level intro to standards. Focus on variety/interoperability dimension. Prof. Amit Sheth Ohio Center of Excellence in Knowledge-enabled Compu5ng (Kno.e.sis) Wright State University, USA Tutorial @ Kno.e.sis Centre: Seman5cs Approach to Big Data and Event Processing, Oct 7-9, 2015
  2. 2. BigData Mastering the variety dimension of Big Data with seman&c technologies: high level intro to standards. Focus on variety/interoperability dimension.
  3. 3. BigData Semantic Approach to Big Data and Event Processing What are the most important recent soUware/Internet success stories?
  4. 4. BigData Semantic Approach to Big Data and Event Processing Apple’s Siri IBM’s Watson Google Seman5c Search What are common technologies?
  5. 5. BigData Semantic Approach to Big Data and Event Processing Just stepping back a bit
  6. 6. BigData •  MicrosoU purchased Powerset in 2008 •  Apple purchased Siri [Apr 2010] –  “Once Again The Back Story Is About Seman5c Web” •  Google buys Metaweb [June 2010]...” Google Snaps Up Metaweb in Seman5c Web Play” and releases Seman5c search in 2013 –  Now see: “Google Knowledge Graph Could Change Search Forever” •  Facebook OpenGraph, Twicer annota5on …”another example of seman5c web going mainstream” “Google, Twicer and Facebook build the seman5c web” 6 Semantic technologies in the mainstream
  7. 7. BigData •  RDFa adop5on ….Search engines (esp Bing) started using domain models and (all) use of background knowledge/structured databases with large en5ty bases (these are part of Knowledge Graph and equivalent) •  Bing, Yahoo! and Google are using schema.org in a big way Semantic technologies in the mainstream
  8. 8. BigData •  Seman5cs with metadata and ontologies for heterogeneous documents and mul5ple repositories of data including the Web was discussed in 1990s (seman5c informa5on brokering, faceted search, InfoHarness, SIMS, Ariadne, OBSERVER, SHOE, MREF, InfoQuilt, …). Also DAML and OIL. •  Tim Berners-Lee used “Seman5c Web” in his 1999 book •  I had founded a company Taalee in 1999, gave a keynote on Seman5c Web & commercializa5on in 2000 and filed for a patent in 2000 (awarded 2001). •  Well known TBL, Hendler, Lassila paper in Scien5fic American took AI-ish approach (agents,…) to Seman5c Web •  First 5 years saw too much of AI/DL, but more prac5cal/ applied work has dominated recently A bit of history
  9. 9. BigData •  TBL – focus on data: Data Web (“In a way, the Seman5c Web is a bit like having all the databases out there as one big database.”) •  Others focus on reasoning and intelligent processing •  But the biggest current use seems to be about Search: –  15 years of Seman5c Search and Ontology-enabled Seman5c Applica5ons Different foci
  10. 10. BigData 1 2 3 of Seman5c Web
  11. 11. BigData Semagix Freedom for building ontology-driven informa5on system Managing Semantic Content on the Web Extracting Semantic Metadata from Semistructured and Structured Sources (1999 – 2002)
  12. 12. BigData 1 •  Ontology: Agreement with a common vocabulary/ nomenclature, conceptual models and domain Knowledge •  Schema + Knowledge base •  Agreement is what enables interoperability •  Formal descrip5on - Machine processability is what leads to automa5on
  13. 13. BigData 2 •  Seman5c Annota5on (Metadata Extrac5on): Associa5ng meaning with data, or labeling data so it is more meaningful to the system and people. •  Can be manual, semi-automa5c (automa5c with human verifica5on), automa5c.
  14. 14. BigData Shallow seman&cs Deep seman&cs Changing Focus on Interoperability in Informa5on Systems: From System, Syntax, Structure to Seman5cs From Syntax to Semantics
  15. 15. BigData SSN Ontology 2 Interpreted data (deductive) [in OWL] e.g., threshold 1 Annotated Data [in RDF] e.g., label 0 Raw Data [in TEXT] e.g., number 3 Interpreted data (abductive) [in OWL] e.g., diagnosis Intellego “150” Systolic blood pressure of 150 mmHg Elevated Blood Pressure Hyperthyroidism …… 15 Levels of Abstraction
  16. 16. BigData 3 •  Reasoning/Computa5on: seman5cs enabled search, integra5on, answering complex queries, connec5ons and analyses (paths, sub graphs), pacern finding, mining, hypothesis valida5on, discovery, visualiza5on
  17. 17. BigData •  Web of Linked Data •  Introduced by Berners Lee et. al as next step for Web of Documents •  Allow “machine understanding” of data, •  Create “common” models of domains using formal language - ontologies Layer cake image source: http://www.w3.org; see W3C SW publications Semantic Web Layer Cake Semantic Web Stack
  18. 18. BigData •  Resource Descrip5on Framework – Recommended by W3C for metadata modeling [RDF] •  A standard common modeling framework – usable by humans and machine understandable IBM Armonk, New York, United States Zurich, Switzerland Location Company Headquarters located in Research lab located in RDF/OWL slides From: Seman5c Web in Health Informa5cs (thanks: Satya) RDF: Resource Description Framework
  19. 19. BigData •  RDF Triple o  Subject: The resource that the triple is about o  Predicate: The property of the subject that is described by the triple o  Object: The value of the property •  Web Addressable Resource: Uniform Resource Locator (URL), Uniform Resource Iden5fier (URI), Interna5onalized Resource Iden5fier (IRI) •  Qualified Namespace: hcp://www.w3.org/2001/XMLSchema# as xsd: o  xsd: string instead of hcp://www.w3.org/2001/ XMLSchema#string IBM Armonk, New York, United States Headquarters located in RDF: Triple Structure, IRI, Namespace
  20. 20. BigData •  Two types of property values in a triple o  Web resource o  Typed literal IBM Armonk, New York, United States Headquarters located in IBM Has total employees “430,000” ^^xsd:integer •  The graph model of RDF: node-arc-node is the primary representation model •  Secondary notations: Triple notation o  companyExample:IBM companyExample:has-Total- Employee “430,000”^^xsd:integer . RDF Representation
  21. 21. BigData•  RDF Schema: Vocabulary for describing groups of resources [RDFS] IBM Armonk, New York, United States Headquarters located in Oracle Redwood Shores, California, United States Headquarters located in Company Geographical Location Headquarters located in RDFS: RDF Schema
  22. 22. BigData •  Property domain (rdfs:domain) and range (rdfs:range) Headquarters located in Company Domain Range Geographical Location •  Class Hierarchy/Taxonomy: rdfs:subClassOf rdfs:subClassOf Computer Technology Company SubClass (Parent) Class Company Banking Company Insurance Company RDFS: RDF Schema
  23. 23. BigData •  Ontologies are shared conceptualiza5ons of a domain represented in a formal language* •  Ontologies: o Common representa5on model - facilitate interoperability, integra5on across different projects, and enforce consistent use of terminology o Closely reflect domain-specific details (domain seman*cs) essen5al to answer end user o  Support reasoning to discover implicit knowledge * Paraphrased from Gruber, 1993 Ontology: A Working Definition
  24. 24. BigData •  A SPARQL query pacern composed of triples •  Triples correspond to RDF triple structure, but have variable at: o Subject: ?company ex:hasHeadquaterLoca5on ex:NewYork. o Predicate: ex:IBM ?wha5slocatedin ex:NewYork. o Object: ex:IBM ex:hasHeadquaterLoca5on ? loca5on. •  Result of SPARQL query is list of values – values can replace variable in query pacern SPARQL: Querying Semantic Web Data
  25. 25. BigData •  An example query pacern PREFIX ex:<hcp://www.eecs600.case.edu/> SELECT ?company ?loca5on WHERE {?company ex:hasHeadquaterLoca5on ?loca5on.} •  Query Result company location IBM NewYork Oracle RedwoodCity MicorosoftCorporation Bellevue Multiple Matches SPARQL: Query Patterns
  26. 26. BigData •  SELECT: Returns the values bound to the variables •  CONSTRUCT: Returns an RDF graph •  DESCRIBE: Returns a descrip5on (RDF graph) of a resource (e.g. IBM) o The contents of RDF graph is determined by SPARQL query processor •  ASK: Returns a Boolean o True o False SPARQL: Query Forms
  27. 27. BigData Semantic Approach to Big Data and Event Processing a licle bit about ontologies
  28. 28. BigData Open Biomedical Ontologies http://bioportal.bioontology.org/ , http://obo.sourceforge.net/ Many Ontologies Available Today
  29. 29. BigData Semantic Approach to Big Data and Event Processing From simple ontologies
  30. 30. BigData owl:thing prescription _drug_ brand_name brandname_ undeclared brandname_ composite prescription _drug monograph _ix_class cpnum_ group prescription _drug_ property indication_ property formulary_ property non_drug_ reactant interaction_ property property formulary brandname_ individual interaction_ with_prescri ption_drug interaction indication generic_ individual prescription _drug_ generic generic_ composite interaction_ with_non_ drug_reactant interaction_ with_mono graph_ix_cl ass Drug Ontology Hierarchy (showing is-a relationships)
  31. 31. BigData Semantic Approach to Big Data and Event Processing to complex ontologies
  32. 32. BigData GNT-I a<aches GlcNAc at posi&on 2 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 GNT-V a<aches GlcNAc at posi&on 6 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 N-acetyl-glucosaminyl_transferase_V N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4 N-Glycosylation metabolic pathway
  33. 33. BigData Semantic Approach to Big Data and Event Processing A licle bit about seman5c metadata extrac5ons and annota5ons
  34. 34. BigData WWW, Enterprise Repositories METADATA EXTRACTORS Digital Maps Nexis UPI AP Feeds/ Documents Digital Audios Data Stores Digital Videos Digital Images . . . . . . . . . Create/extract as much (semantics) metadata automatically as possible; Use ontlogies to improve and enhance extraction Extraction for Metadata Creation
  35. 35. BigData Semantic Annotation of Textual Data
  36. 36. BigData36 Seman5c Annota5on with mul5ple ontologies Person Company Coordinates Coordinate System Time Units Timezone Spatial Ontology Domain Ontology Temporal Ontology Mike Bocs, "SensorML and Sensor Web Enablement," Earth System Science Center, UAB Huntsville Semantic Annotation with multiple ontologies
  37. 37. BigData 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion m/z fragment ion m/z ms/ms peaklist data fragment ion abundance parent ion abundance parent ion charge ProPreO: Ontology-mediated provenance Mass Spectrometry (MS) Data Seman&c Annota&on of Experimental Data Semantic Annotation of Experimental Data
  38. 38. BigData38 Popula'on Level Personal Wheeze – Yes Do you have *ghtness of chest? –Yes ObservationsPhysical-Cyber-Social System Health Signal Extraction Health Signal Understanding <Wheezing=Yes, time, location> <ChectTightness=Yes, time, location> <PollenLevel=Medium, time, location> <Pollution=Yes, time, location> <Activity=High, time, location> Wheezing ChectTightness PollenLevel Pollution Activity Wheezing ChectTightness PollenLevel Pollution Activity RiskCategory <PollenLevel, ChectTightness, Pollution, Activity, Wheezing, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> . . . Expert Knowledge Background Knowledge tweet reporting pollution level and asthma attacks Acceleration readings from on-phone sensors Sensor and personal observations Signals from personal, personal spaces, and community spaces Risk Category assigned by doctors Qualify Quantify Enrich Outdoor pollen and pollution Public Health Semantic Annotation of Sensor and Social Data Well Controlled - con5nue Not Well Controlled – contact nurse Poor Controlled – contact doctor
  39. 39. BigData Objects of Interest “An object by itself is intensely uninteresting”. Grady Booch, Object Oriented Design with Applications, 1991 Keywords | Search (data) Entities | Integration (information) Relationships | Analysis, Insight (knowledge) Entities + Relationships also needed to model & study Events
  40. 40. BigData •  Early Seman5c Search, use baby steps of today’s engines •  Enterprise applica5ons – healthcare & life sciences, financial, security •  Driving the innova5on with new types of data: sensor (Seman5c Sensor Web), social (Seman5c Social Web), seman5c IoT/WoT Sample applications
  41. 41. BigData Focused relevant content organized by topic (semantic categorization) Automatic Content Aggregation from multiple content providers and feeds Related relevant content not explicitly asked for (semantic associations) Competitive research inferred automatically Automatic 3rd party content integration Equity Research Dashboard with Blended Semantic Querying and Browsing
  42. 42. BigDataOntology Semantic Query Server 1. Ontology Model Creation (Description) 2. Knowledge Agent Creation 3. Automatic aggregation of Knowledge4. Querying the Ontology © Semagix, Inc. Ontology Creation and Maintenance Steps
  43. 43. BigData2004 SEMAGIX 43 Watch list Organiza5on Company Hamas WorldCom FBI Watchlist Ahmed Yaseer appears on Watchlist member of organiza5on works for Company Ahmed Yaseer: •  Appears on Watchlist ‘FBI’ •  Works for Company ‘WorldCom’ •  Member of a banned organiza*on’ Semantic Associations - Connecting the Dots
  44. 44. BigDataFraud Preven5on applica5on used in financial services – Related KYC applica5on is deployed at Majority of Global Banks User will be able to navigate the ontology using a number of different interfaces World Wide Web content Public Records BLOGS, RSS Un-structure text, Semi-structured Data Watch Lists Law Enforcement Regulators Semi-structured Government Data Scores the en5ty based on the content and en5ty rela5onships Establishing New Account Global Investment Bank
  45. 45. BigDataStructured text (Scien&fic publica&ons / white papers) Experimental Results Clinical Trial Data Public domain knowledge (PubMed) Metadata Extrac5on/Seman5c Annota5ons Ontologies/ Domain Models/ Knowledge Meta data / Seman5c Annota5ons Semantic Search/ Browsing/Personalization/ Analysis, Knowledge Discovery, Visualization, Situational Awareness Big data Search and browsing Pacerns / Inference / Reasoning 2D-3D & Immersive Visualiza5on, Human Computer Interfaces Impac5ng bocom line Knowledge discovery Migraine Stress Pa5ent affects isa Magnesium Calcium Channel Blockers inhibit SEMANTICS, MEANING PROCESSING 45
  46. 46. BigData Semantic Approach to Big Data and Event Processing Thank you! Any Ques5on?

×