SlideShare a Scribd company logo
1 of 19
Scaling-up and Speeding-up Video Analytics Inside Database Engine Qiming Chen1, Meichun Hsu1, Rui Liu2, and WeihongWang2 1 HP Labs, Palo Alto, California, USA 2 HP Labs, Beijing, China Hewlett Packard Co.
Motivation Video has become an indispensable carrier of information  For business perception, decision and action Existent video analysis applications generally fail to scale Database is treated a storage engine rather than a computation engine Transfer of massive amount of data is the bottleneck A unified platform is required by  The demand for near real-time responses to enable Operational BI Data-intensive transformation and analysis
Our Approach Push down video processing to database engine  Faster data access, less data transfer User Defined Functions (UDFs) as wrapper of video analysis and search operations
Problems with UDF (1) Lack of formal support of relational input and output Unaware of relation schema Unable to model complex applications,  Unable to be composed with relational operators in a SQL query Typically executed in the tuple-wise pipeline in query processing Performance penalty for certain applications Prohibits data-parallel computation inside the function body
Problems with UDF (2) Dilemma between UDF execution efficiency and coding easiness UDF must use system internal data objects and system calls  Encoding DBMS data into strings to pass to UDFs incurs significant overhead
Our Solutions Supporting Relation-Valued Functions (RVF) at SQL level E.g. SVM classifier as RVF Relations as input and output Easier application modeling Higher execution efficiency Make possible exploring of parallelism RVF invocation pattern  Mechanisms of applying RVF’s input/output High-level APIs are provide Invocation pattern-oriented RVF containers Support RVF running in query processing 6 9/1/2009
Video Pattern Recognition Process
Video Retrieval Process
Video Classification by SVM Tables: Features [featureID, imageID, featureType, feature] Models [modelID, featureType, concept, model] Labels [imageID, concept, nearness]
SVM by Scalar UDF – the Inefficiency Classify using conventional scalar UDF SELECT imageID, concept, AVG (nearness)  FROM 	(SELECT imageID, featureID, concept, classify0 (f.featureType, m.concept, 	f.feature, m.model) AS nearness  	FROM Features f, Models m  	WHERE f.featrureType = m.featrureType) GROUP BY imageID, concept; For each feature of each image, its nearness score to each concept is computed The resulting nearness measures are aggregated by an average function Inefficiency of execution Model cannot be cached Model is retrieved for each feature
RVFs as Relational Operators A simple RVF definition DEFINE RVF f (R1, R2, k) RETURN R3 { Relation R1 (/*schema*/); Relation R2 (/*schema*/); int k; Relation R3 (/*schema*/); PROCEDURE fn(/*dll name*/); RETURN MODE SET_MODE; INVOCATION PATTERN BLOCK } RVF can be naturally composed with relational operators or sub-queries SELECT * FROM RVF1(RVF2(Q1, Q2), Q3);
SVM by Relation Value Function SELECT imageID,concept,AVG(nearness) FROM (SELECT imageID, featureID, 	concept, nearness FROM classify1( 		“SELECT * FROM Features”, 		“SELECT concept, model, featureType FROM Models”)) GROUP BY imageID, concept;
RVF Invocation Patterns Invocation pattern  Mechanism to deal with input/output of RVF Generalization of the limited forms Purposes Ensuring that its interaction with the query executor is defined at a high level Making it possible to provide high-level APIs Shielding UDF developers from DBMS system internal details
Patterns Defined Basic pattern Per-tuple pattern Block pattern Complex pattern CartProdProbe (Cartesian product probe)
CartProdProbe Pattern SELECT r.imageID, r.concept, AVG(r.nearness) FROM  	(Features f CROSS APPLY classify2 ( f.featureID, f.featureType, f.feature, 		“SELECT concept, model, featureType FROM Models”)) r GROUP BY r.imageID, r.concept; Features table is fed into RVF tuple by tuple; Models table fed in as a whole
RVF Container An extension of query executor for supporting RVF execution Invocation pattern-specific Argument evaluation Return value wrapping Memory context switching Data conversion Initial data preparation Cross-call data passing Final cleanup
Performance Gain in SVM Classification by Using RVF SVM query using RVF outperforms that using conventional scalar UDF
Support In-RVF Data-Parallel -SVM Learning INSERT INTO Models SELECT modelID + 1, ‘feature_type’, ‘concept_name’, svm_learning ( 	“SELECT feature, nearness  FROM TrainFeatures f, TrainLables l  WHERE l.imageID = f.imageID AND l.concept = ’concept_name’ AND 	f.featureType = ‘feature_type’”) FROM Models WHERE modelID = (SELECT max(modelID) from Models); SVM learning speed up in multi-core RVF
Summary Video analysis system inside a database engine Leverage UDF to push down video analytics RVFs, a language level extension Improve the capability of application modeling Increase efficiency execution and cache uses Make it possible to explore computation parallelism RVF container and its associated APIs Separate analytics logic from system administration and programming efforts Prototyped on the PostgreSQL

More Related Content

What's hot

10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural stylesMajong DevJfu
 
5 sins of all hands ppt
5 sins of all hands ppt5 sins of all hands ppt
5 sins of all hands pptSpike Gu
 
EmployDEX Consulting Services
EmployDEX Consulting ServicesEmployDEX Consulting Services
EmployDEX Consulting Serviceskhandaa
 
Half-Push/Half-Polling
Half-Push/Half-PollingHalf-Push/Half-Polling
Half-Push/Half-PollingYoungSu Son
 
Building modular applications
Building modular applicationsBuilding modular applications
Building modular applicationsIndicThreads
 
Automatically bridging UML profiles into MOF metamodels
Automatically bridging UML profiles into MOF metamodelsAutomatically bridging UML profiles into MOF metamodels
Automatically bridging UML profiles into MOF metamodelsIvano Malavolta
 
MexADL - HADAS Presentation
MexADL - HADAS PresentationMexADL - HADAS Presentation
MexADL - HADAS Presentationjccastrejon
 
4 - Architetture Software - Architecture Portfolio
4 - Architetture Software - Architecture Portfolio4 - Architetture Software - Architecture Portfolio
4 - Architetture Software - Architecture PortfolioMajong DevJfu
 

What's hot (8)

10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles
 
5 sins of all hands ppt
5 sins of all hands ppt5 sins of all hands ppt
5 sins of all hands ppt
 
EmployDEX Consulting Services
EmployDEX Consulting ServicesEmployDEX Consulting Services
EmployDEX Consulting Services
 
Half-Push/Half-Polling
Half-Push/Half-PollingHalf-Push/Half-Polling
Half-Push/Half-Polling
 
Building modular applications
Building modular applicationsBuilding modular applications
Building modular applications
 
Automatically bridging UML profiles into MOF metamodels
Automatically bridging UML profiles into MOF metamodelsAutomatically bridging UML profiles into MOF metamodels
Automatically bridging UML profiles into MOF metamodels
 
MexADL - HADAS Presentation
MexADL - HADAS PresentationMexADL - HADAS Presentation
MexADL - HADAS Presentation
 
4 - Architetture Software - Architecture Portfolio
4 - Architetture Software - Architecture Portfolio4 - Architetture Software - Architecture Portfolio
4 - Architetture Software - Architecture Portfolio
 

Viewers also liked

Extend Udf Technology For Integrated Analytics
Extend Udf Technology For Integrated AnalyticsExtend Udf Technology For Integrated Analytics
Extend Udf Technology For Integrated AnalyticsRui Liu
 
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS HostingDAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS HostingRui Liu
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7Roger Barga
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteRoger Barga
 
10 Tips for WeChat
10 Tips for WeChat10 Tips for WeChat
10 Tips for WeChatChris Baker
 
20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage ContentBarry Feldman
 

Viewers also liked (6)

Extend Udf Technology For Integrated Analytics
Extend Udf Technology For Integrated AnalyticsExtend Udf Technology For Integrated Analytics
Extend Udf Technology For Integrated Analytics
 
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS HostingDAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
10 Tips for WeChat
10 Tips for WeChat10 Tips for WeChat
10 Tips for WeChat
 
20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content
 

Similar to Scaling Up And Speeding Up Video Analytics Inside Database Engine

“A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
“A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...“A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
“A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...Edge AI and Vision Alliance
 
Innovate 2014 DevOps: Release and Deploy Keynote
Innovate 2014 DevOps: Release and Deploy KeynoteInnovate 2014 DevOps: Release and Deploy Keynote
Innovate 2014 DevOps: Release and Deploy KeynoteDaniel Berg
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clusteringNishanth Harapanahalli
 
Legacy Renewal of Central Framework in the Enterprise
Legacy Renewal of Central Framework in the EnterpriseLegacy Renewal of Central Framework in the Enterprise
Legacy Renewal of Central Framework in the EnterpriseAnatole Tresch
 
Surekha_haoop_exp
Surekha_haoop_expSurekha_haoop_exp
Surekha_haoop_expsurekhakadi
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolKellyton Brito
 
NashTech - Azure Application Insights
NashTech - Azure Application InsightsNashTech - Azure Application Insights
NashTech - Azure Application InsightsPhi Huynh
 
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2Vladimir Bacvanski, PhD
 
Introduction to Telerik OpenAccess ORM
Introduction to Telerik OpenAccess ORMIntroduction to Telerik OpenAccess ORM
Introduction to Telerik OpenAccess ORMpeterbahaa
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
Virtualising Tier 1 Apps
Virtualising Tier 1 AppsVirtualising Tier 1 Apps
Virtualising Tier 1 AppsIwan Rahabok
 
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUMEHan Yan
 
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUMEHan Yan
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel ArchitecturesJoel Falcou
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleAmazon Web Services
 

Similar to Scaling Up And Speeding Up Video Analytics Inside Database Engine (20)

“A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
“A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...“A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
“A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
 
Innovate 2014 DevOps: Release and Deploy Keynote
Innovate 2014 DevOps: Release and Deploy KeynoteInnovate 2014 DevOps: Release and Deploy Keynote
Innovate 2014 DevOps: Release and Deploy Keynote
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
 
Legacy Renewal of Central Framework in the Enterprise
Legacy Renewal of Central Framework in the EnterpriseLegacy Renewal of Central Framework in the Enterprise
Legacy Renewal of Central Framework in the Enterprise
 
Surekha_haoop_exp
Surekha_haoop_expSurekha_haoop_exp
Surekha_haoop_exp
 
Work Portfolio
Work PortfolioWork Portfolio
Work Portfolio
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval Tool
 
NashTech - Azure Application Insights
NashTech - Azure Application InsightsNashTech - Azure Application Insights
NashTech - Azure Application Insights
 
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
 
.net Framework
.net Framework.net Framework
.net Framework
 
Deep learning in manufacturing predicting and preventing manufacturing defect...
Deep learning in manufacturing predicting and preventing manufacturing defect...Deep learning in manufacturing predicting and preventing manufacturing defect...
Deep learning in manufacturing predicting and preventing manufacturing defect...
 
Web 2.0 Development with IBM DB2
Web 2.0 Development with IBM DB2Web 2.0 Development with IBM DB2
Web 2.0 Development with IBM DB2
 
Introduction to Telerik OpenAccess ORM
Introduction to Telerik OpenAccess ORMIntroduction to Telerik OpenAccess ORM
Introduction to Telerik OpenAccess ORM
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Virtualising Tier 1 Apps
Virtualising Tier 1 AppsVirtualising Tier 1 Apps
Virtualising Tier 1 Apps
 
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUME
 
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUME
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at Scale
 

Scaling Up And Speeding Up Video Analytics Inside Database Engine

  • 1. Scaling-up and Speeding-up Video Analytics Inside Database Engine Qiming Chen1, Meichun Hsu1, Rui Liu2, and WeihongWang2 1 HP Labs, Palo Alto, California, USA 2 HP Labs, Beijing, China Hewlett Packard Co.
  • 2. Motivation Video has become an indispensable carrier of information For business perception, decision and action Existent video analysis applications generally fail to scale Database is treated a storage engine rather than a computation engine Transfer of massive amount of data is the bottleneck A unified platform is required by The demand for near real-time responses to enable Operational BI Data-intensive transformation and analysis
  • 3. Our Approach Push down video processing to database engine Faster data access, less data transfer User Defined Functions (UDFs) as wrapper of video analysis and search operations
  • 4. Problems with UDF (1) Lack of formal support of relational input and output Unaware of relation schema Unable to model complex applications, Unable to be composed with relational operators in a SQL query Typically executed in the tuple-wise pipeline in query processing Performance penalty for certain applications Prohibits data-parallel computation inside the function body
  • 5. Problems with UDF (2) Dilemma between UDF execution efficiency and coding easiness UDF must use system internal data objects and system calls Encoding DBMS data into strings to pass to UDFs incurs significant overhead
  • 6. Our Solutions Supporting Relation-Valued Functions (RVF) at SQL level E.g. SVM classifier as RVF Relations as input and output Easier application modeling Higher execution efficiency Make possible exploring of parallelism RVF invocation pattern Mechanisms of applying RVF’s input/output High-level APIs are provide Invocation pattern-oriented RVF containers Support RVF running in query processing 6 9/1/2009
  • 9. Video Classification by SVM Tables: Features [featureID, imageID, featureType, feature] Models [modelID, featureType, concept, model] Labels [imageID, concept, nearness]
  • 10. SVM by Scalar UDF – the Inefficiency Classify using conventional scalar UDF SELECT imageID, concept, AVG (nearness) FROM (SELECT imageID, featureID, concept, classify0 (f.featureType, m.concept, f.feature, m.model) AS nearness FROM Features f, Models m WHERE f.featrureType = m.featrureType) GROUP BY imageID, concept; For each feature of each image, its nearness score to each concept is computed The resulting nearness measures are aggregated by an average function Inefficiency of execution Model cannot be cached Model is retrieved for each feature
  • 11. RVFs as Relational Operators A simple RVF definition DEFINE RVF f (R1, R2, k) RETURN R3 { Relation R1 (/*schema*/); Relation R2 (/*schema*/); int k; Relation R3 (/*schema*/); PROCEDURE fn(/*dll name*/); RETURN MODE SET_MODE; INVOCATION PATTERN BLOCK } RVF can be naturally composed with relational operators or sub-queries SELECT * FROM RVF1(RVF2(Q1, Q2), Q3);
  • 12. SVM by Relation Value Function SELECT imageID,concept,AVG(nearness) FROM (SELECT imageID, featureID, concept, nearness FROM classify1( “SELECT * FROM Features”, “SELECT concept, model, featureType FROM Models”)) GROUP BY imageID, concept;
  • 13. RVF Invocation Patterns Invocation pattern Mechanism to deal with input/output of RVF Generalization of the limited forms Purposes Ensuring that its interaction with the query executor is defined at a high level Making it possible to provide high-level APIs Shielding UDF developers from DBMS system internal details
  • 14. Patterns Defined Basic pattern Per-tuple pattern Block pattern Complex pattern CartProdProbe (Cartesian product probe)
  • 15. CartProdProbe Pattern SELECT r.imageID, r.concept, AVG(r.nearness) FROM (Features f CROSS APPLY classify2 ( f.featureID, f.featureType, f.feature, “SELECT concept, model, featureType FROM Models”)) r GROUP BY r.imageID, r.concept; Features table is fed into RVF tuple by tuple; Models table fed in as a whole
  • 16. RVF Container An extension of query executor for supporting RVF execution Invocation pattern-specific Argument evaluation Return value wrapping Memory context switching Data conversion Initial data preparation Cross-call data passing Final cleanup
  • 17. Performance Gain in SVM Classification by Using RVF SVM query using RVF outperforms that using conventional scalar UDF
  • 18. Support In-RVF Data-Parallel -SVM Learning INSERT INTO Models SELECT modelID + 1, ‘feature_type’, ‘concept_name’, svm_learning ( “SELECT feature, nearness FROM TrainFeatures f, TrainLables l WHERE l.imageID = f.imageID AND l.concept = ’concept_name’ AND f.featureType = ‘feature_type’”) FROM Models WHERE modelID = (SELECT max(modelID) from Models); SVM learning speed up in multi-core RVF
  • 19. Summary Video analysis system inside a database engine Leverage UDF to push down video analytics RVFs, a language level extension Improve the capability of application modeling Increase efficiency execution and cache uses Make it possible to explore computation parallelism RVF container and its associated APIs Separate analytics logic from system administration and programming efforts Prototyped on the PostgreSQL