This document discusses scaling up video analytics inside a database engine. It proposes using relation-valued functions (RVFs) as a way to model complex video analysis applications, increase execution efficiency by caching models, and enable data-parallel computation. RVFs allow video analytics logic to be expressed as relational operators that can be naturally composed with SQL queries. An RVF container handles invocation patterns and system interactions to improve performance. The approach was prototyped using PostgreSQL.
Scaling Up And Speeding Up Video Analytics Inside Database Engine
1. Scaling-up and Speeding-up Video Analytics Inside Database Engine Qiming Chen1, Meichun Hsu1, Rui Liu2, and WeihongWang2 1 HP Labs, Palo Alto, California, USA 2 HP Labs, Beijing, China Hewlett Packard Co.
2. Motivation Video has become an indispensable carrier of information For business perception, decision and action Existent video analysis applications generally fail to scale Database is treated a storage engine rather than a computation engine Transfer of massive amount of data is the bottleneck A unified platform is required by The demand for near real-time responses to enable Operational BI Data-intensive transformation and analysis
3. Our Approach Push down video processing to database engine Faster data access, less data transfer User Defined Functions (UDFs) as wrapper of video analysis and search operations
4. Problems with UDF (1) Lack of formal support of relational input and output Unaware of relation schema Unable to model complex applications, Unable to be composed with relational operators in a SQL query Typically executed in the tuple-wise pipeline in query processing Performance penalty for certain applications Prohibits data-parallel computation inside the function body
5. Problems with UDF (2) Dilemma between UDF execution efficiency and coding easiness UDF must use system internal data objects and system calls Encoding DBMS data into strings to pass to UDFs incurs significant overhead
6. Our Solutions Supporting Relation-Valued Functions (RVF) at SQL level E.g. SVM classifier as RVF Relations as input and output Easier application modeling Higher execution efficiency Make possible exploring of parallelism RVF invocation pattern Mechanisms of applying RVF’s input/output High-level APIs are provide Invocation pattern-oriented RVF containers Support RVF running in query processing 6 9/1/2009
9. Video Classification by SVM Tables: Features [featureID, imageID, featureType, feature] Models [modelID, featureType, concept, model] Labels [imageID, concept, nearness]
10. SVM by Scalar UDF – the Inefficiency Classify using conventional scalar UDF SELECT imageID, concept, AVG (nearness) FROM (SELECT imageID, featureID, concept, classify0 (f.featureType, m.concept, f.feature, m.model) AS nearness FROM Features f, Models m WHERE f.featrureType = m.featrureType) GROUP BY imageID, concept; For each feature of each image, its nearness score to each concept is computed The resulting nearness measures are aggregated by an average function Inefficiency of execution Model cannot be cached Model is retrieved for each feature
11. RVFs as Relational Operators A simple RVF definition DEFINE RVF f (R1, R2, k) RETURN R3 { Relation R1 (/*schema*/); Relation R2 (/*schema*/); int k; Relation R3 (/*schema*/); PROCEDURE fn(/*dll name*/); RETURN MODE SET_MODE; INVOCATION PATTERN BLOCK } RVF can be naturally composed with relational operators or sub-queries SELECT * FROM RVF1(RVF2(Q1, Q2), Q3);
12. SVM by Relation Value Function SELECT imageID,concept,AVG(nearness) FROM (SELECT imageID, featureID, concept, nearness FROM classify1( “SELECT * FROM Features”, “SELECT concept, model, featureType FROM Models”)) GROUP BY imageID, concept;
13. RVF Invocation Patterns Invocation pattern Mechanism to deal with input/output of RVF Generalization of the limited forms Purposes Ensuring that its interaction with the query executor is defined at a high level Making it possible to provide high-level APIs Shielding UDF developers from DBMS system internal details
15. CartProdProbe Pattern SELECT r.imageID, r.concept, AVG(r.nearness) FROM (Features f CROSS APPLY classify2 ( f.featureID, f.featureType, f.feature, “SELECT concept, model, featureType FROM Models”)) r GROUP BY r.imageID, r.concept; Features table is fed into RVF tuple by tuple; Models table fed in as a whole
16. RVF Container An extension of query executor for supporting RVF execution Invocation pattern-specific Argument evaluation Return value wrapping Memory context switching Data conversion Initial data preparation Cross-call data passing Final cleanup
17. Performance Gain in SVM Classification by Using RVF SVM query using RVF outperforms that using conventional scalar UDF
18. Support In-RVF Data-Parallel -SVM Learning INSERT INTO Models SELECT modelID + 1, ‘feature_type’, ‘concept_name’, svm_learning ( “SELECT feature, nearness FROM TrainFeatures f, TrainLables l WHERE l.imageID = f.imageID AND l.concept = ’concept_name’ AND f.featureType = ‘feature_type’”) FROM Models WHERE modelID = (SELECT max(modelID) from Models); SVM learning speed up in multi-core RVF
19. Summary Video analysis system inside a database engine Leverage UDF to push down video analytics RVFs, a language level extension Improve the capability of application modeling Increase efficiency execution and cache uses Make it possible to explore computation parallelism RVF container and its associated APIs Separate analytics logic from system administration and programming efforts Prototyped on the PostgreSQL