This document discusses extending UDF technology for integrated analytics. It proposes relation-valued functions (RVFs) to address limitations of UDFs like lack of set input/output and coding difficulty. RVFs define invocation patterns and can be composed with queries. The solution separates an RVF into a shell and user function, generates RVF shells, and uses simple relation object mapping to prototype the approach on PostgreSQL. This improves UDF modeling capability, efficiency, and ease of coding analytics functions.
4Developers 2015: Clean JavaScript code - only dream or reality - Sebastian Ł...PROIDEA
Sebastian Łaciak
Language: English
JavaScript isn’t longer language used only by secondary school students. It is powerful tool to build next generation web application. You can use TDD, integration test, simple IDEs, static code analysis, object oriented design and well known patterns. Can't believe it? See you in new JS reality.
4Developers 2015: Clean JavaScript code - only dream or reality - Sebastian Ł...PROIDEA
Sebastian Łaciak
Language: English
JavaScript isn’t longer language used only by secondary school students. It is powerful tool to build next generation web application. You can use TDD, integration test, simple IDEs, static code analysis, object oriented design and well known patterns. Can't believe it? See you in new JS reality.
The latest statistics from WeChat place its monthly active users (MAU) at 700million, with audiences visiting the application upwards of 30 times per day.
While follower numbers for most brands continue to grow, the honeymoon appears to be over. Signs are starting to emerge that follower growth rates for brand accounts are slowing.
At the same time, the government has started to apply pressure to regulate H5 apps built onto WeChat. And Tencent itself is applying greater control over brand activities.
Brands will have to employ more effective content strategies on WeChat moving forward. In this presentation we share our tips to help brands continue to grow by attracting/retaining audiences on WeChat.
20 Ideas for your Website Homepage ContentBarry Feldman
Perplexed about what to put on your website home? Every company deals with this tough challenge. The 20 ideas in this presentation should give you a strong starting point.
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
When analyzing big data, you often have to process data at scale that is not rectangular in nature and you would like to scale out your existing programs and cognitive algorithms to analyze your data. To address this need and make it easy for the programmer to add her domain specific code, U-SQL includes a rich extensibility model that allows you to process any kind of data, ranging from CSV files over JSON and XML to image files and add your own custom operators. In this presentation, we will provide some examples on how to use U-SQL to process interesting data formats with custom extractors and functions, including JSON, images, use U-SQL’s cognitive library and finally show how U-SQL allows you to invoke custom code written in Python and R.
Slides for SQL Saturday 635, Vancouver BC presentation, Vancouver BC. Aug 2017.
From time to time, there is a need to modify information systems due to changes in legislation (like SOX), standards, currency change (like the euro), and more. These types of changes have a substantial impact on many components of an information system and therefore contain a high risk factor.
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
RAMSES: A new project in data-driven analytical modeling of distributed systems
RAMSES is a new DOE-funded project on the end-to-end analytical performance modeling of science workflows in extreme-scale science environments. It aims to link multiple threads of inquiry that have not, until now, been adequately connected: namely, first-principles performance modeling within individual sub-disciplines (e.g., networks, storage systems, applications), and data-driven methods for evaluating, calibrating, and synthesizing models of complex phenomena. What makes this fusion necessary is the drive to explain, predict, and optimize not just individual system components but complex end-to-end workflows. In this talk, I will introduce the goals of the project and some aspects of our technical approach.
The latest statistics from WeChat place its monthly active users (MAU) at 700million, with audiences visiting the application upwards of 30 times per day.
While follower numbers for most brands continue to grow, the honeymoon appears to be over. Signs are starting to emerge that follower growth rates for brand accounts are slowing.
At the same time, the government has started to apply pressure to regulate H5 apps built onto WeChat. And Tencent itself is applying greater control over brand activities.
Brands will have to employ more effective content strategies on WeChat moving forward. In this presentation we share our tips to help brands continue to grow by attracting/retaining audiences on WeChat.
20 Ideas for your Website Homepage ContentBarry Feldman
Perplexed about what to put on your website home? Every company deals with this tough challenge. The 20 ideas in this presentation should give you a strong starting point.
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
When analyzing big data, you often have to process data at scale that is not rectangular in nature and you would like to scale out your existing programs and cognitive algorithms to analyze your data. To address this need and make it easy for the programmer to add her domain specific code, U-SQL includes a rich extensibility model that allows you to process any kind of data, ranging from CSV files over JSON and XML to image files and add your own custom operators. In this presentation, we will provide some examples on how to use U-SQL to process interesting data formats with custom extractors and functions, including JSON, images, use U-SQL’s cognitive library and finally show how U-SQL allows you to invoke custom code written in Python and R.
Slides for SQL Saturday 635, Vancouver BC presentation, Vancouver BC. Aug 2017.
From time to time, there is a need to modify information systems due to changes in legislation (like SOX), standards, currency change (like the euro), and more. These types of changes have a substantial impact on many components of an information system and therefore contain a high risk factor.
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
RAMSES: A new project in data-driven analytical modeling of distributed systems
RAMSES is a new DOE-funded project on the end-to-end analytical performance modeling of science workflows in extreme-scale science environments. It aims to link multiple threads of inquiry that have not, until now, been adequately connected: namely, first-principles performance modeling within individual sub-disciplines (e.g., networks, storage systems, applications), and data-driven methods for evaluating, calibrating, and synthesizing models of complex phenomena. What makes this fusion necessary is the drive to explain, predict, and optimize not just individual system components but complex end-to-end workflows. In this talk, I will introduce the goals of the project and some aspects of our technical approach.
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
GoPro’s camera, drone, mobile devices as well as web, desktop applications are generating billions of event logs. The analytics metrics and insights that inform product, engineering, and marketing team decisions need to be distributed quickly and efficiently. We need to visualize the metrics to find the trends or anomalies.
While trying to building up the features store for machine learning, we need to visualize the features, Google Facets is an excellent project for visualizing features. But can we visualize larger feature dataset?
These are issues we encounter at GoPro as part of the data platform evolution. In this talk, we will discuss few of the progress we made at GoPro. We will talk about how to use Slack + Plot.ly to delivery analytics metrics and visualization. And we will also discuss our work to visualize large feature set using Google Facets with Apache Spark.
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
Spark SQL enables Spark to perform efficient and fault-tolerant relational query processing with analytics database technologies. The relational queries are compiled to the executable physical plans consisting of transformations and actions on RDDs with the generated Java code. The code is compiled to Java bytecode, executed at runtime by JVM and optimized by JIT to native machine code at runtime. This talk will take a deep dive into Spark SQL execution engine. The talk includes pipelined execution, whole-stage code generation, UDF execution, memory management, vectorized readers, lineage based RDD transformation and action.
Towards a metamodel for the Rubus Component ModelAlessio Bucaioni
Presentation of the speech at ModComp, MODELS 2014, held in Valencia, Spain.
ICONS CREDITS
Magnifying Glass by Edward Boatman from The Noun Project
Time by Wayne Middleton from The Noun Project
Puzzle by Agarunov Oktay-Abraham from The Noun Project
Gears by Eugen Belyakoff from The Noun Project
Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys
Presentation from Microsoft Data Science Summit 2016
Presents 4 examples of custom U-SQL data processing: Overlapping Range Aggregation, JSON Processing, Image Processing and R with U-SQL
Data products derive their value from data and generate new data in return; as a result, machine learning techniques must be applied to their architecture and their development. Machine learning fits models to make predictions on unknown inputs and must be generalizable and adaptable. As such, fitted models cannot exist in isolation; they must be operationalized and user facing so that applications can benefit from the new data, respond to it, and feed it back into the data product. Data product architectures are therefore life cycles and understanding the data product lifecycle will enable architects to develop robust, failure free workflows and applications. In this talk we will discuss the data product life cycle, explore how to engage a model build, evaluation, and selection phase with an operation and interaction phase. Following the lambda architecture, we will investigate wrapping a central computational store for speed and querying, as well as incorporating a discussion of monitoring, management, and data exploration for hypothesis driven development. From web applications to big data appliances; this architecture serves as a blueprint for handling data services of all sizes!
How can we quickly tell what an application is about? How can we quickly tell what it does? How can we distinguish business concepts from architecture clutter? How can we quickly find the code we want to change? How can we instinctively know where to add code for new features? Purely looking at unit tests is either not possible or too painful. Looking at higher-level tests can take a long time and still not give us the answers we need. For years, we have all struggled to design and structure projects that reflect the business domain.
In this talk Sandro will be sharing how he designed the last application he worked on, twisting a few concepts from Domain-Driven Design, properly applying MVC, borrowing concepts from CQRS, and structuring packages in non-conventional ways. Sandro will also be touching on SOLID principles, Agile incremental design, modularisation, and testing. By iteratively modifying the project structure to better model the product requirements, he has come up with a design style that helps developers create maintainable and domain-oriented software.
Similar to Extend Udf Technology For Integrated Analytics (20)
1. Qiming Chen, Meichun Hsu, Rui Liu* HP Labs, Palo Alto, California, USA *HP Labs, Beijing, China Extend UDF Technology for Integrated Analytics
2. Motivations Running data-intensive analytics outside database causes significant overhead Huge round-trip data transfer overhead between database platform and computation platform Analytics layer is burdened with many generic data management issues Opportunity to balance resource utilization between data management and analytic processing is lost UDF has been extensively investigated for pushing down computation
3. Challenges & Problems (1) UDF is lack of formal support of relational input and output Unable to model complex applications Inefficiency of execution Tuple-wise pipeline prohibits in–function batch and parallel processing
4. Challenges & Problems (2) There exists a conflict between UDF execution efficiency and coding easiness UDF is hard to code Analytics users have to deal with hard-to-follow system details, while MapRedcueisolates system details form developer Encoding arguments into strings simplifies argument passing while incurs performance penalty
5.
6. Solution (2) Simple Relation Object Mapping (SROM) Separate RVF into RVF shell and ‘user-function’ Automated RVF shell generation
10. Acollection of sample images of ‘typical’ corner kick scenescorner kick In soccer games
11. Calculate Image Similarity For each image Extract SIFT features Each point as a128-dimensional vector Generate a composite feature vector The closeness of two images is determined by the similarity of their composite feature vectors 8 8/31/2009
12. Rank Sample Images SELECT Sid, COUNT(Neighbor) AS n FROM (SELECT P.ID AS Neighbor, (SELECT S.ID FROM CKSamples S WHERE sim(P.feature, S.feature) = (SELECT MAX(sim(P2.feature, S2.feature)) FROM CKSamples S2, CKImages P2 WHERE P2.ID = P.ID)) AS Sid FROM CKImages P) GROUP BY Sid ORDER BY n; Derive the closest sample image of each corner kick image (by maximal similarity) For each sample image s, calculate the number of images having s as the closest sample Rank the sample images by that number 9 8/31/2009
13. Inefficiency of execution SELECT Sid, COUNT(Neighbor) AS n FROM (SELECT P.ID AS Neighbor, (SELECT S.ID FROM CKSamples S WHERE sim(P.feature, S.feature) = (SELECT MAX(sim(P2.feature, S2.feature)) FROM CKSamples S2, CKImages P2 WHERE P2.ID = P.ID)) AS Sid FROM CKImages P) GROUP BY Sid ORDER BY n; CKSamples relation is not cached CKSamples relation is retrieved in a nested query for each (tuple) instance p of CKImages 10 8/31/2009
14. Relation Value Function RVF is specified as DEFINE RVF f (x, y, R1, R2) RETURN R3 { float a, b; Relation R1 (/*schema1*/); Relation R2 (/*schema2*/); Relation R3 (/*schema3*/); PROCEDURE fn(/*dll name*/); RETURN MODE SET_MODE; INVOCATION PATTERN BLOCK } RVFs can be naturally composed along with other relational operators or sub-queries SELECT * FROM rvf1(Q4, rvf2(Q1, Q2, Q3));
16. PerTuple Input Mode SELECT ID, Summary FROM per_image_summery_rvf (“SELECT feature FROM CKSamples”);
17. Block Input Mode SELECT r.sid, COUNT(r.neighbor) AS n FROM ck_ rvf1 (“SELECT * FROM CKImages”, “SELECT * FROM CKIsamples”) r GROUP BY r.sid ORDER BY n;
18. PerTuple/Block Input Mode SELECT Sid, COUNT(Neighbor) AS n FROM ( SELECT P.ID AS Neighbor, ck_ rvf2 (P.ID, P.feature, “SELECT * FROM CKIsamples”) AS Sid FROM CKImages P) GROUP BY Sid ORDER BY n;
19. Separating RVF Shell and User-Function Separate an RVF into RVF shell and user-function Provide high-level RVF Shell APIs for building the shell Shading the DBMS internal details from RVF developers Generate RVF shells based on RVF specifications, input and output modes
24. Summary Tackled two major limitations of UDF technology Lack of set input or output which causes insufficient application modeling capability and inefficiency of execution Difficulty in coding and integrating UDFs with the query engine Relation Value function Extend UDF for pushing down data-intensive computation RVF invocation pattern Separate RVF into RVF shell and user function RVF shell generation and Simple Relation Object Mapping (SROM) Prototype has been implemented on PostgreSQL