SlideShare a Scribd company logo
1 of 15
Reasoning Over Big Data Stores
Eric Little, PhD
VP Data Science
Polytechnic School of Engineering - NYU
eric.little@osthus.com
Slide 2
Who We Are & What We Do
OSTHUS, Inc. is the U.S.
subsidiary of OSTHUS GmbH
Global presence - offices in
Germany, U.S. & China
Provide advanced solutions,
consulting and technology
services for Pharmaceutical and
Biotech R&D
 Technology provider for the
Allotrope effort, globally aligning
several pharma and biotech
companies
Slide 3
Semantic Technologies – Smart Data Piece
Semantic Technologies
 Provide several important features for emerging new technologies
• Controlled vocabularies
• Taxonomies
• Metadata structures
• Ontology models
• Logical inference
Data today continues to evolve and grow in both size and complexity.
We need hybrid solutions that can provide real insights
 Analytics is growing into a new kind of field – Data Science
 Is data science about interacting with machines or humans?
 Must be able to strike a balance between complexity of the data and
simplicity of the presentation to the user
Slide 4
Metadata, Reference Data & Master Data
• While often lumped together, these are distinct kinds of data
• Semantic Technologies can help with the organization of these
kinds of data – but should not be done in isolation
• Scalability is achieved using complementary approaches
Increasedconceptualcomplexity
IncreasedScalabilityIssues
Slide 5
Graphs are good for information –
not so good for high-bandwidth
applications where speed and
scalability are the primary drivers.
Can require highly specialized
hardware, software techniques or
engineers
Semantics should be confined to
the metadata aspects of the
problem – use other tech for the
rest
Where Semantics Can Fall Short
Slide 6
Big Data is a real challenge –
but starting to become a buzz
word
 Many “Big Data Problems”
can be reduced to smaller
data problems
Applications exist that require
complex inferencing over very
large data sets
 A current client has lab
readings from 40,000+
devices
How to do this effectively?
The Big Data Problem
Slide 7
Why Not Just Build the Data Lake?
Data lakes are fine when you
are gathering and storing the
data
 What happens later on when
a lot of data is in there?
The benefits are that data can
stay in its original form – no
real ETL
But running analytics across
disparate stores is very
challenging
“Without metadata, every
subsequent use of data means
analysts start from scratch.”
(Gartner 2014)
Slide 8
Reasoning Over Big Data Is A Growing Topic
There has been an inordinate amount of time and energy spent on
just queries.
 This is not reasoning though – it is just retrieval
What is Reasoning?
 More than just automated query sets run in sequence or parallel
 Reasoning is about inferring new information that isn’t in the raw data.
 It is a heuristic – where one discovers or learns something new for
themselves
 Deductive, Inductive, Abductive
Slide 9
Logical Reasoning (does
not always assume set
theory)
Mathematical Reasoning
(which is logical
reasoning, but assumes
set theory as the basis)
9
Types of Reasoning One Can Use
Slide 10
Reasoning Evolution
Slide 11
Types of Semantic Inference (Forward and
Backward Chaining)
Uses Modus Ponens
Finds a T consequent and
affirms related antecedent
(verifies connection)
Uses Modus Ponens
Finds a T antecedent & affirms a
related consequent (new
knowledge)
Slide 12
Ontology Layering Is Important for Scale
Data Source Models
Multi- & Single-Source Data
Integration Models
Domain Models (Objs, Attributes,
Process & Relations)
System Lvl Models (Rules)
DataTraceability(Provenance)
UserDrivenOntologies
Upper-Lvl Models
Meta-data
Levels
(Human
Concepts)
Data-centric
Levels
(Machine
Language)
Metaphysics – not just data models
Data Sources connected directly to higher classifications
Federation allows for improved scale
Slide 13
Get your semantics experts and your big data scientists on the same
page
 Utilize tables where possible – avoid multi-node graph hops
 Use graphs for metadata – leave instance data in place when possible
 Large graphs should be avoided
 Lots of columns and rows are fine – joins across tables are not
 Break graph information into other formats wherever possible
Pre-compute phases are important
 Pre-compute multi-table joins based on SME input, known semantic
patterns, business rules/logics, etc.
 Use statistical methods to cluster data (e.g., normalcy calcs)
Use the tech that is right for the job
Combining Semantics and NoSQL
Slide 14
One Example of Using RDF in Cloud-scalable
Applications
Example of a current approach being used – there are others
Can scale across multiple cloud nodes (where TS’s have issues)
Triples are indexed items
THANK YOU – QUESTIONS?
Eric Little, PhD
VP Data Science
OSTHUS, Inc.
eric.little@osthus.com
(M) 321-480-4818
www.linkedin.com/pub/eric-little

More Related Content

What's hot

Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...
Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...
Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...James Miranda
 
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionAI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionDr. Haxel Consult
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovationopen_phacts
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
 
2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & VisualizationTreparel
 
Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014Treparel
 
IC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage PointIC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage PointDr. Haxel Consult
 
Writing a Databases Research Paper
Writing a Databases Research PaperWriting a Databases Research Paper
Writing a Databases Research PaperDamian T. Gordon
 
Real callenges in big data security
Real callenges in big data securityReal callenges in big data security
Real callenges in big data securitybalasahebcomp
 
Managing sensitive applications in the public cloud
Managing sensitive applications in the public cloudManaging sensitive applications in the public cloud
Managing sensitive applications in the public cloudieeepondy
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —swethaT16
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaBabasab Patil
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 

What's hot (20)

Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...
Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...
Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...
 
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionAI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Paper presentation
Paper presentationPaper presentation
Paper presentation
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization
 
Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014
 
IC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage PointIC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage Point
 
Writing a Databases Research Paper
Writing a Databases Research PaperWriting a Databases Research Paper
Writing a Databases Research Paper
 
Real callenges in big data security
Real callenges in big data securityReal callenges in big data security
Real callenges in big data security
 
Managing sensitive applications in the public cloud
Managing sensitive applications in the public cloudManaging sensitive applications in the public cloud
Managing sensitive applications in the public cloud
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 
SciBite
SciBiteSciBite
SciBite
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mba
 
IC-SDV 2019: OntoChem
IC-SDV 2019: OntoChemIC-SDV 2019: OntoChem
IC-SDV 2019: OntoChem
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 

Similar to Reasoning over big data

Reinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & FasterReinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & FasterOSTHUS
 
Why Data is Becoming the Most Valuable Asset Companies Posses
Why Data is Becoming the Most Valuable Asset Companies PossesWhy Data is Becoming the Most Valuable Asset Companies Posses
Why Data is Becoming the Most Valuable Asset Companies PossesOSTHUS
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the datamark madsen
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming DatacentricTimothy Cook
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning CCG
 
How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...WiMLDSMontreal
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfGraceOkeke3
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4
 
Data Collaboration Stack
Data Collaboration StackData Collaboration Stack
Data Collaboration StackPierre Brunelle
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6varshakumar21
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...PhD Assistance
 
How to Start Doing Data Science
How to Start Doing Data ScienceHow to Start Doing Data Science
How to Start Doing Data ScienceAyodele Odubela
 

Similar to Reasoning over big data (20)

Reinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & FasterReinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & Faster
 
Why Data is Becoming the Most Valuable Asset Companies Posses
Why Data is Becoming the Most Valuable Asset Companies PossesWhy Data is Becoming the Most Valuable Asset Companies Posses
Why Data is Becoming the Most Valuable Asset Companies Posses
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the data
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming Datacentric
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
On Big Data
On Big DataOn Big Data
On Big Data
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdf
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the table
 
Data Collaboration Stack
Data Collaboration StackData Collaboration Stack
Data Collaboration Stack
 
Data science unit2
Data science unit2Data science unit2
Data science unit2
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
 
How to Start Doing Data Science
How to Start Doing Data ScienceHow to Start Doing Data Science
How to Start Doing Data Science
 

More from OSTHUS

The Fast Track to Fair Lab Data
The Fast Track to Fair Lab Data The Fast Track to Fair Lab Data
The Fast Track to Fair Lab Data OSTHUS
 
Challenges & Opportunities of Implementation FAIR in Life Sciences
Challenges & Opportunities of Implementation FAIR in Life SciencesChallenges & Opportunities of Implementation FAIR in Life Sciences
Challenges & Opportunities of Implementation FAIR in Life SciencesOSTHUS
 
From allotrope to reference master data management
From allotrope to reference master data management From allotrope to reference master data management
From allotrope to reference master data management OSTHUS
 
Early AI Adoption Via Advanced Analytics
Early AI Adoption Via  Advanced AnalyticsEarly AI Adoption Via  Advanced Analytics
Early AI Adoption Via Advanced AnalyticsOSTHUS
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
 
Why paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart labWhy paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart labOSTHUS
 
Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016OSTHUS
 
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...OSTHUS
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...OSTHUS
 
Best Practice Reference Architecture for Data Curation
Best Practice Reference Architecture for Data CurationBest Practice Reference Architecture for Data Curation
Best Practice Reference Architecture for Data CurationOSTHUS
 
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...OSTHUS
 
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS
 
Data Quality- How to clean up your legacy data
Data Quality- How to clean up your legacy dataData Quality- How to clean up your legacy data
Data Quality- How to clean up your legacy dataOSTHUS
 
Data Quality- How to clean up your legacy data?
Data Quality- How to clean up your legacy data?Data Quality- How to clean up your legacy data?
Data Quality- How to clean up your legacy data?OSTHUS
 

More from OSTHUS (14)

The Fast Track to Fair Lab Data
The Fast Track to Fair Lab Data The Fast Track to Fair Lab Data
The Fast Track to Fair Lab Data
 
Challenges & Opportunities of Implementation FAIR in Life Sciences
Challenges & Opportunities of Implementation FAIR in Life SciencesChallenges & Opportunities of Implementation FAIR in Life Sciences
Challenges & Opportunities of Implementation FAIR in Life Sciences
 
From allotrope to reference master data management
From allotrope to reference master data management From allotrope to reference master data management
From allotrope to reference master data management
 
Early AI Adoption Via Advanced Analytics
Early AI Adoption Via  Advanced AnalyticsEarly AI Adoption Via  Advanced Analytics
Early AI Adoption Via Advanced Analytics
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
Why paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart labWhy paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart lab
 
Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016
 
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
 
Best Practice Reference Architecture for Data Curation
Best Practice Reference Architecture for Data CurationBest Practice Reference Architecture for Data Curation
Best Practice Reference Architecture for Data Curation
 
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
 
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
 
Data Quality- How to clean up your legacy data
Data Quality- How to clean up your legacy dataData Quality- How to clean up your legacy data
Data Quality- How to clean up your legacy data
 
Data Quality- How to clean up your legacy data?
Data Quality- How to clean up your legacy data?Data Quality- How to clean up your legacy data?
Data Quality- How to clean up your legacy data?
 

Recently uploaded

Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneCall girls in Ahmedabad High profile
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escorts
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our EscortsCall Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escorts
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escortsindian call girls near you
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewingbigorange77
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 

Recently uploaded (20)

Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escorts
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our EscortsCall Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escorts
Call Girls in East Of Kailash 9711199171 Delhi Enjoy Call Girls With Our Escorts
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewing
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 

Reasoning over big data

  • 1. Reasoning Over Big Data Stores Eric Little, PhD VP Data Science Polytechnic School of Engineering - NYU eric.little@osthus.com
  • 2. Slide 2 Who We Are & What We Do OSTHUS, Inc. is the U.S. subsidiary of OSTHUS GmbH Global presence - offices in Germany, U.S. & China Provide advanced solutions, consulting and technology services for Pharmaceutical and Biotech R&D  Technology provider for the Allotrope effort, globally aligning several pharma and biotech companies
  • 3. Slide 3 Semantic Technologies – Smart Data Piece Semantic Technologies  Provide several important features for emerging new technologies • Controlled vocabularies • Taxonomies • Metadata structures • Ontology models • Logical inference Data today continues to evolve and grow in both size and complexity. We need hybrid solutions that can provide real insights  Analytics is growing into a new kind of field – Data Science  Is data science about interacting with machines or humans?  Must be able to strike a balance between complexity of the data and simplicity of the presentation to the user
  • 4. Slide 4 Metadata, Reference Data & Master Data • While often lumped together, these are distinct kinds of data • Semantic Technologies can help with the organization of these kinds of data – but should not be done in isolation • Scalability is achieved using complementary approaches Increasedconceptualcomplexity IncreasedScalabilityIssues
  • 5. Slide 5 Graphs are good for information – not so good for high-bandwidth applications where speed and scalability are the primary drivers. Can require highly specialized hardware, software techniques or engineers Semantics should be confined to the metadata aspects of the problem – use other tech for the rest Where Semantics Can Fall Short
  • 6. Slide 6 Big Data is a real challenge – but starting to become a buzz word  Many “Big Data Problems” can be reduced to smaller data problems Applications exist that require complex inferencing over very large data sets  A current client has lab readings from 40,000+ devices How to do this effectively? The Big Data Problem
  • 7. Slide 7 Why Not Just Build the Data Lake? Data lakes are fine when you are gathering and storing the data  What happens later on when a lot of data is in there? The benefits are that data can stay in its original form – no real ETL But running analytics across disparate stores is very challenging “Without metadata, every subsequent use of data means analysts start from scratch.” (Gartner 2014)
  • 8. Slide 8 Reasoning Over Big Data Is A Growing Topic There has been an inordinate amount of time and energy spent on just queries.  This is not reasoning though – it is just retrieval What is Reasoning?  More than just automated query sets run in sequence or parallel  Reasoning is about inferring new information that isn’t in the raw data.  It is a heuristic – where one discovers or learns something new for themselves  Deductive, Inductive, Abductive
  • 9. Slide 9 Logical Reasoning (does not always assume set theory) Mathematical Reasoning (which is logical reasoning, but assumes set theory as the basis) 9 Types of Reasoning One Can Use
  • 11. Slide 11 Types of Semantic Inference (Forward and Backward Chaining) Uses Modus Ponens Finds a T consequent and affirms related antecedent (verifies connection) Uses Modus Ponens Finds a T antecedent & affirms a related consequent (new knowledge)
  • 12. Slide 12 Ontology Layering Is Important for Scale Data Source Models Multi- & Single-Source Data Integration Models Domain Models (Objs, Attributes, Process & Relations) System Lvl Models (Rules) DataTraceability(Provenance) UserDrivenOntologies Upper-Lvl Models Meta-data Levels (Human Concepts) Data-centric Levels (Machine Language) Metaphysics – not just data models Data Sources connected directly to higher classifications Federation allows for improved scale
  • 13. Slide 13 Get your semantics experts and your big data scientists on the same page  Utilize tables where possible – avoid multi-node graph hops  Use graphs for metadata – leave instance data in place when possible  Large graphs should be avoided  Lots of columns and rows are fine – joins across tables are not  Break graph information into other formats wherever possible Pre-compute phases are important  Pre-compute multi-table joins based on SME input, known semantic patterns, business rules/logics, etc.  Use statistical methods to cluster data (e.g., normalcy calcs) Use the tech that is right for the job Combining Semantics and NoSQL
  • 14. Slide 14 One Example of Using RDF in Cloud-scalable Applications Example of a current approach being used – there are others Can scale across multiple cloud nodes (where TS’s have issues) Triples are indexed items
  • 15. THANK YOU – QUESTIONS? Eric Little, PhD VP Data Science OSTHUS, Inc. eric.little@osthus.com (M) 321-480-4818 www.linkedin.com/pub/eric-little