SlideShare a Scribd company logo
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Fingerprinting Latent
Structure in Data
MRITYUNJAY KUMAR & GUNTUR RAVINDRA
TECHNOLOGY EXCELLENCE GROUP
TALENTICA SOFTWARE
PRESENTED AT DAIR (DATA ANALYTICS AND INTELLIGENCE RESEARCH ,INDIAN INSTITUTE OF TECHNOLOGY, DELHI)
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Agenda
 Challenge with building data-driven algorithms
 Small-data
 Introduction to data fingerprinting
 Two problem statements
 Solving a Question complexity problem
 Solving an Image recognition problem
 Fingerprinting the structure in data
 Extracting structure
 Representing structure as a signature
 Other complex problems
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
What is data fingerprinting
 A method to represent a block of data as an entity
 Applications: Easy validation, proof of originality, tamper detection, DLP
 Classical techniques
 Bloom filters, cryptographic hashes
 Main issues with fingerprinting
 Do not capture data semantics
 Large number of fingerprints  complexity
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Two Problems
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognizing question complexity
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognizing question complexity
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognizing structural deformation in
cells
Data source: https://www.kaggle.com/c/data-science-bowl-2018/data
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Data-driven algorithms with Small-
Data
 Need for problem-specific data
 Rule-based approaches
 Rule-based approaches are easy to implement
 Not all data characteristics can be captured as rules
 Does not automatically adapt to the data
 Machine learning approach
 ML approaches need large amounts of data
 Generic models and open-source data are not suitable for application-specific
needs
 Can build complex structures and designs
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Architecting a solution
• Knowledge has a latent structure
• Sequence, Geometry
• There can be a hierarchies of structures
• convert structure to a computational representation
• Objective: context of application
capabilities Influences computational
representation
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Problem Formulation
A set of elements : images, questions, Text messages
An objective
A subset of structures relevant to an objective
How do we define and how do we find
Transformation of elements into a structure and hence
a computational entity
A human in the loop
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Structures in Data
How many buses are plying in Mumbai on a route originating at Dadar and ending at Vashi?
How many students are in the class?
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Structures in Data
Intensity Projections
Oriented gradients
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Problem Formulation
For computational ease we make
A function that maps a structure to vector
The inverse of the function results in one of
many structures
a binary bit-vector
Goal is to find so as to satisfy the constraints
This is a constrained optimization formulation
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Solution : Optimization formulation
 Based on the problem formulation
 We have an optimization formulation that has an inverse that results in the
variable itself or a subset of variables
 A related function is a neural auto-encoder
 Solution boils down to
 Training an auto-encoder with one class of data
 Recognizing data class involves
 Data clustering
 Human intelligence/visual inspection to mark clusters
 Data in clusters used to train the auto-encoder
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognition : Cell Structure
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognition : Question Complexity
How much can the SP alter income tax in Scotland?
What is stage 1 in the life of a bill?
Who is the President of Egypt?
Why do some people purposely resist officers of the law?
Why is the need for acceptance of punishment needed?
Why would one plead guilty to a crime involving civil disobedience?
Why is giving a defiant speech sometimes more harmful for the individual?
Why did Harvard end its early admission program?
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
 The auto-encoder output has distortions
 Detect the distortion
 Quantify the distortion
Solution : Recognition
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Building Complexity
 Incremental addition of data classes
 Using stacking
 Unique binary code injected in each
stacked layer
 Collapse stacked layers into a
classification model  redeploy
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Data Type Test
Cases
True
Positive
False
Positive
True False
Negative
With classes
like in
training data
1781 1774 NA NA 7
With classes
not like in
training data
8789 NA 13 8776 NA
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Summary
 A large number of applications are still small-data applications
 Data has latent structure
 Extraction is objective based and data specific
 We can harness data-hungry algorithms for small-data applications
 Use structures instead of raw data
 Auto-encoders are powerful tools
 Build incremental complexity

More Related Content

What's hot

5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data Annotation5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data Annotation
Innodata, Inc
 
Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...
dyadelm
 
Ai trends and startups in india
Ai trends and startups in india Ai trends and startups in india
Ai trends and startups in india
Archana Ramakrishnan
 
Less Artificial More Intelligent
Less Artificial More IntelligentLess Artificial More Intelligent
Less Artificial More Intelligent
pipemode
 
Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?
CILIP
 
Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018
ThomasCook16
 
Understanding the ABC's of AI
Understanding the ABC's of AIUnderstanding the ABC's of AI
Understanding the ABC's of AI
Dickson Lukose
 
LegalTech - Bots vs Lawyers
LegalTech - Bots vs LawyersLegalTech - Bots vs Lawyers
LegalTech - Bots vs Lawyers
Eric Rodriguez (Hiring in Lex)
 
Using Open Data to fuel LegalTech Innovation
Using Open Data to fuel LegalTech InnovationUsing Open Data to fuel LegalTech Innovation
Using Open Data to fuel LegalTech Innovation
Eric Rodriguez (Hiring in Lex)
 
Ross Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech AnalysisRoss Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Rostyslav Chayka
 
Dennis Hills - Introduction to Machine Learning on Mobile.pdf
Dennis Hills -  Introduction to Machine Learning on Mobile.pdfDennis Hills -  Introduction to Machine Learning on Mobile.pdf
Dennis Hills - Introduction to Machine Learning on Mobile.pdf
Amazon Web Services
 
Resume
ResumeResume
Resume
Rinki Gupta
 
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayConstructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Baoxu Shi
 
Data Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarData Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit Jaokar
Jessica Willis
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
AI Frontiers
 
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
KTN
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision Making
Lee Schlenker
 
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
Taejoon Yoo
 

What's hot (18)

5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data Annotation5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data Annotation
 
Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...
 
Ai trends and startups in india
Ai trends and startups in india Ai trends and startups in india
Ai trends and startups in india
 
Less Artificial More Intelligent
Less Artificial More IntelligentLess Artificial More Intelligent
Less Artificial More Intelligent
 
Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?
 
Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018
 
Understanding the ABC's of AI
Understanding the ABC's of AIUnderstanding the ABC's of AI
Understanding the ABC's of AI
 
LegalTech - Bots vs Lawyers
LegalTech - Bots vs LawyersLegalTech - Bots vs Lawyers
LegalTech - Bots vs Lawyers
 
Using Open Data to fuel LegalTech Innovation
Using Open Data to fuel LegalTech InnovationUsing Open Data to fuel LegalTech Innovation
Using Open Data to fuel LegalTech Innovation
 
Ross Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech AnalysisRoss Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech Analysis
 
Dennis Hills - Introduction to Machine Learning on Mobile.pdf
Dennis Hills -  Introduction to Machine Learning on Mobile.pdfDennis Hills -  Introduction to Machine Learning on Mobile.pdf
Dennis Hills - Introduction to Machine Learning on Mobile.pdf
 
Resume
ResumeResume
Resume
 
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayConstructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
 
Data Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarData Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit Jaokar
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
 
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision Making
 
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
 

Similar to Data fingerprinting

The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...
Shift Conference
 
A Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelA Pragmatic AI Maturity Model
A Pragmatic AI Maturity Model
DATAVERSITY
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Edureka!
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
Oracle Developers
 
Designing a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science StrategyDesigning a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science Strategy
DATAVERSITY
 
Machine Learning on Mobile
Machine Learning on MobileMachine Learning on Mobile
Machine Learning on Mobile
Amazon Web Services
 
Machine Learning on Mobile
Machine Learning on MobileMachine Learning on Mobile
Machine Learning on Mobile
Amazon Web Services
 
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Saama
 
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyondCompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
Zeshan Sattar
 
Santisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-shareSantisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook Limpeeticharoenchot
 
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-shareSantisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
stelligence
 
Overview about Emerging Technologies
Overview about Emerging TechnologiesOverview about Emerging Technologies
Overview about Emerging Technologies
Murali Venkatesh
 
CompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity ApprenticeshipsCompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity Apprenticeships
Zeshan Sattar
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Edureka!
 
Introduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SFIntroduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SF
Amazon Web Services
 
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglioArtificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Amazon Web Services
 
Functional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJSFunctional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJS
Vivek Tikar
 
DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼
Sutaek Kim
 
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
AIIM International
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Amazon Web Services
 

Similar to Data fingerprinting (20)

The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...
 
A Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelA Pragmatic AI Maturity Model
A Pragmatic AI Maturity Model
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Designing a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science StrategyDesigning a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science Strategy
 
Machine Learning on Mobile
Machine Learning on MobileMachine Learning on Mobile
Machine Learning on Mobile
 
Machine Learning on Mobile
Machine Learning on MobileMachine Learning on Mobile
Machine Learning on Mobile
 
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
 
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyondCompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
 
Santisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-shareSantisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-share
 
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-shareSantisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
 
Overview about Emerging Technologies
Overview about Emerging TechnologiesOverview about Emerging Technologies
Overview about Emerging Technologies
 
CompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity ApprenticeshipsCompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity Apprenticeships
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
 
Introduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SFIntroduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SF
 
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglioArtificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
 
Functional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJSFunctional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJS
 
DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼
 
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
 

Recently uploaded

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 

Recently uploaded (20)

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 

Data fingerprinting

  • 1. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Fingerprinting Latent Structure in Data MRITYUNJAY KUMAR & GUNTUR RAVINDRA TECHNOLOGY EXCELLENCE GROUP TALENTICA SOFTWARE PRESENTED AT DAIR (DATA ANALYTICS AND INTELLIGENCE RESEARCH ,INDIAN INSTITUTE OF TECHNOLOGY, DELHI)
  • 2. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Agenda  Challenge with building data-driven algorithms  Small-data  Introduction to data fingerprinting  Two problem statements  Solving a Question complexity problem  Solving an Image recognition problem  Fingerprinting the structure in data  Extracting structure  Representing structure as a signature  Other complex problems
  • 3. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. What is data fingerprinting  A method to represent a block of data as an entity  Applications: Easy validation, proof of originality, tamper detection, DLP  Classical techniques  Bloom filters, cryptographic hashes  Main issues with fingerprinting  Do not capture data semantics  Large number of fingerprints  complexity
  • 4. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Two Problems
  • 5. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognizing question complexity
  • 6. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognizing question complexity
  • 7. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognizing structural deformation in cells Data source: https://www.kaggle.com/c/data-science-bowl-2018/data
  • 8. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Data-driven algorithms with Small- Data  Need for problem-specific data  Rule-based approaches  Rule-based approaches are easy to implement  Not all data characteristics can be captured as rules  Does not automatically adapt to the data  Machine learning approach  ML approaches need large amounts of data  Generic models and open-source data are not suitable for application-specific needs  Can build complex structures and designs
  • 9. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Architecting a solution • Knowledge has a latent structure • Sequence, Geometry • There can be a hierarchies of structures • convert structure to a computational representation • Objective: context of application capabilities Influences computational representation
  • 10. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Problem Formulation A set of elements : images, questions, Text messages An objective A subset of structures relevant to an objective How do we define and how do we find Transformation of elements into a structure and hence a computational entity A human in the loop
  • 11. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Structures in Data How many buses are plying in Mumbai on a route originating at Dadar and ending at Vashi? How many students are in the class?
  • 12. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Structures in Data Intensity Projections Oriented gradients
  • 13. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Problem Formulation For computational ease we make A function that maps a structure to vector The inverse of the function results in one of many structures a binary bit-vector Goal is to find so as to satisfy the constraints This is a constrained optimization formulation
  • 14. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Solution : Optimization formulation  Based on the problem formulation  We have an optimization formulation that has an inverse that results in the variable itself or a subset of variables  A related function is a neural auto-encoder  Solution boils down to  Training an auto-encoder with one class of data  Recognizing data class involves  Data clustering  Human intelligence/visual inspection to mark clusters  Data in clusters used to train the auto-encoder
  • 15. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognition : Cell Structure
  • 16. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognition : Question Complexity How much can the SP alter income tax in Scotland? What is stage 1 in the life of a bill? Who is the President of Egypt? Why do some people purposely resist officers of the law? Why is the need for acceptance of punishment needed? Why would one plead guilty to a crime involving civil disobedience? Why is giving a defiant speech sometimes more harmful for the individual? Why did Harvard end its early admission program?
  • 17. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.  The auto-encoder output has distortions  Detect the distortion  Quantify the distortion Solution : Recognition
  • 18. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Building Complexity  Incremental addition of data classes  Using stacking  Unique binary code injected in each stacked layer  Collapse stacked layers into a classification model  redeploy
  • 19. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Data Type Test Cases True Positive False Positive True False Negative With classes like in training data 1781 1774 NA NA 7 With classes not like in training data 8789 NA 13 8776 NA
  • 20. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Summary  A large number of applications are still small-data applications  Data has latent structure  Extraction is objective based and data specific  We can harness data-hungry algorithms for small-data applications  Use structures instead of raw data  Auto-encoders are powerful tools  Build incremental complexity

Editor's Notes

  1. Sequence of systemcalls execution  a computer program Sequence of words  a sentence Organization of pixel intensities in a 2d space  image Sequence of images  video
  2. Explain objective : an objective is to detect if a question can be answered by a trained API-based model. Objective can also be to detect if a cell is not deformed.
  3. Explain that this is similar to an auto encoder’s F and INV(F) except that INV can return the representation of any element in S’