This document proposes techniques for automatically ranking results from database queries. It extends TF-IDF models from information retrieval to databases by developing IDF and QF similarity measures. IDF similarity adapts inverse document frequency to databases by calculating frequency of attribute values. QF similarity leverages query frequency from workload logs. An index-based threshold algorithm is used to efficiently retrieve top-K results by sorting on attribute values.
This document proposes techniques for automatically ranking the results of database queries. It introduces IDF Similarity, which adapts the TF-IDF concept from information retrieval to database attributes by calculating IDF scores based on attribute value frequencies. It also introduces QF Similarity, which determines attribute value importance based on frequency in a query workload log. An Index-based Threshold Algorithm is developed to efficiently retrieve the top-K results by exploiting these similarity functions. The algorithm performs sorted and random accesses to tuples to iteratively refine the top results until a stopping condition is met.
- Part I discusses lexicalized context-free grammars including their motivations, definition, and relation to other formalisms.
- Part II discusses standard parsing techniques for lexicalized context-free grammars including time bottom-up and top-down techniques.
- Part III discusses novel parsing algorithms that improve upon standard techniques, including a bottom-up algorithm that runs in O(|VD|3 × |w|4) time for split grammars, and a top-down algorithm that runs in O(|VD|3 × |w|4) time for lexicalized context-free grammars.
This document proposes a modular beamforming architecture for ultrasound imaging that uses FPGA DSP cells to overcome limitations of previous designs. It interleaves the interpolation and coherent summation processes, reducing hardware resources. This allows implementing a 128-channel beamformer in a single FPGA, achieving flexibility like FPGAs but with lower power consumption like ASICs. The design is scalable, allowing a tradeoff between number of channels, time resolution, and resource usage.
Efficient Edge-Skeleton Computation for Polytopes Defined by OraclesVissarion Fisikopoulos
This document summarizes algorithms for computing the edge skeleton of a polytope defined by oracle functions. It first describes an existing algorithm for vertex enumeration in the oracle model that works by computing an initial simplex and recursively querying the oracle. It then presents a new algorithm for computing the edge skeleton that takes as input the oracle functions and a superset of edge directions, and works by generating candidate edge segments and validating them with the oracle. The runtime of this edge skeleton algorithm is polynomial in parameters of the polytope representation.
The document discusses Unidata's Common Data Model (CDM) which aims to provide a standardized way of representing scientific datasets. It describes key components of the CDM including scientific data types, coordinate systems, and data access layers. The goals are to make datasets more useful and interoperable by defining common semantics, georeferencing, and specialized querying capabilities. The CDM defines abstract representations that are agnostic of specific file formats or programming interfaces.
The document discusses computational complexity problems that are solvable in polynomial time but for which no significantly faster algorithms are known. It presents several such problems from areas like graph algorithms, computational biology, and computational geometry. It then discusses recent work that aims to establish conditional lower bounds for the runtime of such problems by relating their hardness to standard conjectures like 3SUM, APSP, SETH, orthogonal vectors, and small universe hitting set. Fine-grained reductions are used to show relationships between problems. Overall, the document outlines an approach for proving conditional lower bounds for problems solvable in polynomial time based on reasonable complexity theoretic conjectures.
This is concerned with designing an exact exponential time algorithm that is better than the well-known 2^n algorithm for the problem Path Contraction. This answers an open question of van't Hof et. al [TCS 2009]. This is based on the article that appeared in ICALP 2019.
Weibull Analysis: Tableau + R Integration by Monica WillbrandData Con LA
This document discusses using Tableau and R integration to perform Weibull analysis. It provides an overview of Weibull reliability analysis and the bathtub curve. It then demonstrates how to set up R scripts in Tableau to calculate Weibull parameters like beta, eta, survival and failure probabilities, and confidence bands. Plots of survival data can then be created in Tableau using these R-calculated values. Links are also provided to download the necessary R packages and Tableau.
This document proposes techniques for automatically ranking the results of database queries. It introduces IDF Similarity, which adapts the TF-IDF concept from information retrieval to database attributes by calculating IDF scores based on attribute value frequencies. It also introduces QF Similarity, which determines attribute value importance based on frequency in a query workload log. An Index-based Threshold Algorithm is developed to efficiently retrieve the top-K results by exploiting these similarity functions. The algorithm performs sorted and random accesses to tuples to iteratively refine the top results until a stopping condition is met.
- Part I discusses lexicalized context-free grammars including their motivations, definition, and relation to other formalisms.
- Part II discusses standard parsing techniques for lexicalized context-free grammars including time bottom-up and top-down techniques.
- Part III discusses novel parsing algorithms that improve upon standard techniques, including a bottom-up algorithm that runs in O(|VD|3 × |w|4) time for split grammars, and a top-down algorithm that runs in O(|VD|3 × |w|4) time for lexicalized context-free grammars.
This document proposes a modular beamforming architecture for ultrasound imaging that uses FPGA DSP cells to overcome limitations of previous designs. It interleaves the interpolation and coherent summation processes, reducing hardware resources. This allows implementing a 128-channel beamformer in a single FPGA, achieving flexibility like FPGAs but with lower power consumption like ASICs. The design is scalable, allowing a tradeoff between number of channels, time resolution, and resource usage.
Efficient Edge-Skeleton Computation for Polytopes Defined by OraclesVissarion Fisikopoulos
This document summarizes algorithms for computing the edge skeleton of a polytope defined by oracle functions. It first describes an existing algorithm for vertex enumeration in the oracle model that works by computing an initial simplex and recursively querying the oracle. It then presents a new algorithm for computing the edge skeleton that takes as input the oracle functions and a superset of edge directions, and works by generating candidate edge segments and validating them with the oracle. The runtime of this edge skeleton algorithm is polynomial in parameters of the polytope representation.
The document discusses Unidata's Common Data Model (CDM) which aims to provide a standardized way of representing scientific datasets. It describes key components of the CDM including scientific data types, coordinate systems, and data access layers. The goals are to make datasets more useful and interoperable by defining common semantics, georeferencing, and specialized querying capabilities. The CDM defines abstract representations that are agnostic of specific file formats or programming interfaces.
The document discusses computational complexity problems that are solvable in polynomial time but for which no significantly faster algorithms are known. It presents several such problems from areas like graph algorithms, computational biology, and computational geometry. It then discusses recent work that aims to establish conditional lower bounds for the runtime of such problems by relating their hardness to standard conjectures like 3SUM, APSP, SETH, orthogonal vectors, and small universe hitting set. Fine-grained reductions are used to show relationships between problems. Overall, the document outlines an approach for proving conditional lower bounds for problems solvable in polynomial time based on reasonable complexity theoretic conjectures.
This is concerned with designing an exact exponential time algorithm that is better than the well-known 2^n algorithm for the problem Path Contraction. This answers an open question of van't Hof et. al [TCS 2009]. This is based on the article that appeared in ICALP 2019.
Weibull Analysis: Tableau + R Integration by Monica WillbrandData Con LA
This document discusses using Tableau and R integration to perform Weibull analysis. It provides an overview of Weibull reliability analysis and the bathtub curve. It then demonstrates how to set up R scripts in Tableau to calculate Weibull parameters like beta, eta, survival and failure probabilities, and confidence bands. Plots of survival data can then be created in Tableau using these R-calculated values. Links are also provided to download the necessary R packages and Tableau.
Inductive Triple Graphs: A purely functional approach to represent RDFJose Emilio Labra Gayo
Slides of my presentation on 3rd International Workshop on Graph Structures for Knowledge Representation, part of the International Joint Conference on Artificial Intelligence, Beijing, China. 4 August 2013
Concurrent Argumentation with Time: an OverviewCarlo Taticchi
The Timed Concurrent Language for Argumentation (tcla) is a framework to model concurrent interactions between communicating agents that reason and take decisions through argumentation processes, also taking into account the temporal duration of the performed actions. Time is, indeed, a crucial factor when dealing with dynamic environments in real-world applications, where agents need to act in a coordinated fashion to reach their own goals. In this paper, we discuss the syntax and the operational semantics of tcla, providing insights on how its constructs can be used to realise complex interactions between agents.
The document discusses the equivalence between context-free grammars (CFGs) and pushdown automata (PDAs). It states that for any CFG, an equivalent PDA can be constructed to accept the language generated by the grammar, and vice versa. This allows a programming language to be specified by a CFG and implemented with a PDA in a compiler. The document also provides procedures for converting between CFGs and PDAs, including an example of constructing a PDA from a given CFG.
Containerisation and Dynamic Frameworks in ICCMA’19Carlo Taticchi
The International Competition on Computational Models of Argumentation (ICCMA) is a successful event dedicated to advancing the state of the art of solvers in Abstract Argumentation. We describe two proposals that will further improve the third and next edition of the competition, i.e. ICCMA 2019. The first novelty concerns the packaging of each solver-application participating in the com- petition in a virtual “light” container (using Docker): this allows for easy deployment and to (re)running all of the submissions on different architectures (Linux, Windows, macOS, and also in the cloud). The second proposal consists of a new track focused on solvers processing dynamic frameworks, i.e., solvers described in terms of changes w.r.t. previous ones: a solver can reuse the solution obtained previously to be faster on the same framework modulo a new argument/attack.
Taegyun Jeon presented on using deep learning and TensorFlow for time series analysis. He discussed applications of time series analysis in finance, speech recognition, language translation, medicine, weather forecasting and sales forecasting. He then covered traditional time series models like AR, MA, ARMA and ARIMA as well as recurrent neural networks. Finally, he demonstrated TensorFlow's time series API for building time series models.
High-dimensional polytopes defined by oracles: algorithms, computations and a...Vissarion Fisikopoulos
This document summarizes a PhD thesis defense about algorithms and computations involving high-dimensional polytopes defined by oracles. It introduces polytope representations, oracle definitions, and discusses resultant polytopes arising in algebraic geometry. It outlines an output-sensitive algorithm for computing projections of resultant polytopes using mixed subdivisions. It also describes work on edge-skeleton computations, a volume algorithm, 4D resultant polytope combinatorics, and high-dimensional predicate software.
Two-level Just-in-Time Compilation with One Interpreter and One EngineYusuke Izawa
This document proposes a two-level just-in-time compilation approach using one interpreter and one engine. It finds that by providing different interpreter definitions to the RPython meta-tracing compiler, different kinds of compilers and compilations can be derived, such as tracing, method, and threaded code compilers. The key idea is an adaptive RPython system that performs multitier compilation by generating different interpreters from a generic interpreter and driving the RPython engine accordingly. This challenges the assumption in the JIT community that a meta-tracing compiler can only perform tracing compilation.
The document discusses graph modification problems from the perspective of parameterized complexity. It presents results on various fixed parameter tractable graph editing problems, including block graph vertex deletion, which can be solved in FPT time O(4k*n^O(1)). The document also discusses a generalization of graph editing problems called (F1F2...Fa)-editing, and presents hardness results showing simultaneous odd cycle transversal is W[1]-hard with respect to the solution size parameter k.
The document discusses stacks and procedures in assembly language programming. It covers stack implementation using registers and instructions, parameter passing methods using registers or stack, and establishing stack frames using ENTER and LEAVE instructions. Procedures can be called using CALL and control returned using RET. The stack is used for temporary data storage, parameter passing, and storing return addresses for procedures.
1. The document proposes a modified Feistel cipher that involves key based substitution, shifting of rows, key based mixing of columns, and modular arithmetic addition on the plaintext blocks.
2. The plaintext is divided into pairs of square matrices P0 and Q0. These matrices undergo multiple rounds of key based substitution, row shifting, key based mixing, modular addition with the key, and shuffling.
3. Cryptanalysis shows that the cipher is strong against conventional attacks due to the multiple operations applied in each round, particularly the key based substitution and mixing.
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
This document summarizes Yusuke Izawa's master's thesis defense on stack hybridization, a mechanism for bridging two compilation strategies - tracing and method-based - in a meta compiler framework. The proposal extends a meta-tracing just-in-time (JIT) compiler to apply different compilation strategies to different parts of a program based on call context. A proof-of-concept implementation in OCaml showed the hybrid approach was about 1.1x faster than a method-based only approach and over 100x faster than a tracing only approach.
INC and DEC Instructions
ADD Instruction
SUB Instruction
NEG Instruction
Implementing Arithmetic Expressions
Flags Affected by Addition and Subtraction
Example Program (AddSub3)
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...Taegyun Jeon
The document provides an introduction to time series analysis and forecasting using TensorFlow. It discusses various time series models including AR, MA, ARMA, ARIMA and RNN models. It then demonstrates how to implement these models using TensorFlow TimeSeries API, including ARRegressor, LSTM and forecasting on test data. Code examples are provided for data preprocessing, training AR and LSTM models on sample time series data, and making predictions on test data.
Gremlin is the graph traversal language of Apache TinkerPop, an open source graph computing framework, that is implemented by a great many graph databases, including DSE Graph. Even the most novice Gremlin user will recognize the Gremlin statement of "g.V()", but in this presentation we will stop to take a moment to understand the elements of that ubiquitous statement and the elements of the steps that append to it. With the foundational knowledge of "Gremlin's Anatomy" firmly held, we will perform an autopsy on an advanced Gremlin traversal and thus expose techniques for examining and taming the most complex and confusing Gremlin one might come across.
The document discusses several applications of stacks, including evaluating arithmetic expressions in Polish notation without needing operator precedence or parentheses, converting expressions between infix and postfix notation, matching parentheses in expressions, and other applications like reversing strings and generating code from expressions. Polish notation places operators after operands to simplify evaluation using a stack. Converting expressions to postfix form uses a stack to remove parentheses and preserve operator order.
This document proposes techniques for automatically ranking results from database queries. It extends TF-IDF models from information retrieval to databases by developing IDF and QF similarity measures. IDF similarity adapts inverse document frequency to databases by calculating frequency of attribute values. QF similarity leverages query frequency from workload logs. An index-based threshold algorithm is used to efficiently retrieve top-K results by sorting on attribute values.
This document proposes techniques for automatically ranking the results of database queries. It introduces IDF Similarity, which adapts the TF-IDF concept from information retrieval to database attributes by calculating IDF scores. It also introduces QF Similarity, which determines attribute importance based on query frequency in a workload rather than collection frequency. An index-based threshold algorithm is developed to efficiently retrieve the top-K results by exploiting these similarity functions. The algorithm performs sorted and random access to database tuples to iteratively refine the top results.
This document proposes techniques for automatically ranking the results of database queries. It introduces IDF Similarity, which adapts the TF-IDF concept from information retrieval to database attributes by calculating IDF scores. It also introduces QF Similarity, which determines attribute importance based on query frequency in a workload rather than collection frequency. An index-based threshold algorithm is developed to efficiently retrieve the top-K results by exploiting these similarity functions. The algorithm performs sorted and random access to tuples to iteratively refine the top results until a stopping condition is met.
This document proposes techniques for automatically ranking the results of database queries. It introduces IDF Similarity, which adapts the TF-IDF concept from information retrieval to database attributes by calculating IDF scores. It also introduces QF Similarity, which determines attribute importance based on query frequency in a workload rather than collection frequency. An index-based threshold algorithm is developed to efficiently retrieve the top-K results by exploiting these similarity functions. The algorithm performs sorted and random access to database tuples to iteratively refine the top results.
This chapter discusses how youth use new media technologies like mobile phones, instant messaging, and social media in their dating practices and intimacy. It explores how these technologies have changed courtship rituals, allowing youth to get to know each other online before meeting in person, but also how they make breaking up more difficult by leaving digital remnants of past relationships. While new media provides benefits of privacy and easier communication, it can also make youth more vulnerable if too much personal information is shared publicly online.
Inductive Triple Graphs: A purely functional approach to represent RDFJose Emilio Labra Gayo
Slides of my presentation on 3rd International Workshop on Graph Structures for Knowledge Representation, part of the International Joint Conference on Artificial Intelligence, Beijing, China. 4 August 2013
Concurrent Argumentation with Time: an OverviewCarlo Taticchi
The Timed Concurrent Language for Argumentation (tcla) is a framework to model concurrent interactions between communicating agents that reason and take decisions through argumentation processes, also taking into account the temporal duration of the performed actions. Time is, indeed, a crucial factor when dealing with dynamic environments in real-world applications, where agents need to act in a coordinated fashion to reach their own goals. In this paper, we discuss the syntax and the operational semantics of tcla, providing insights on how its constructs can be used to realise complex interactions between agents.
The document discusses the equivalence between context-free grammars (CFGs) and pushdown automata (PDAs). It states that for any CFG, an equivalent PDA can be constructed to accept the language generated by the grammar, and vice versa. This allows a programming language to be specified by a CFG and implemented with a PDA in a compiler. The document also provides procedures for converting between CFGs and PDAs, including an example of constructing a PDA from a given CFG.
Containerisation and Dynamic Frameworks in ICCMA’19Carlo Taticchi
The International Competition on Computational Models of Argumentation (ICCMA) is a successful event dedicated to advancing the state of the art of solvers in Abstract Argumentation. We describe two proposals that will further improve the third and next edition of the competition, i.e. ICCMA 2019. The first novelty concerns the packaging of each solver-application participating in the com- petition in a virtual “light” container (using Docker): this allows for easy deployment and to (re)running all of the submissions on different architectures (Linux, Windows, macOS, and also in the cloud). The second proposal consists of a new track focused on solvers processing dynamic frameworks, i.e., solvers described in terms of changes w.r.t. previous ones: a solver can reuse the solution obtained previously to be faster on the same framework modulo a new argument/attack.
Taegyun Jeon presented on using deep learning and TensorFlow for time series analysis. He discussed applications of time series analysis in finance, speech recognition, language translation, medicine, weather forecasting and sales forecasting. He then covered traditional time series models like AR, MA, ARMA and ARIMA as well as recurrent neural networks. Finally, he demonstrated TensorFlow's time series API for building time series models.
High-dimensional polytopes defined by oracles: algorithms, computations and a...Vissarion Fisikopoulos
This document summarizes a PhD thesis defense about algorithms and computations involving high-dimensional polytopes defined by oracles. It introduces polytope representations, oracle definitions, and discusses resultant polytopes arising in algebraic geometry. It outlines an output-sensitive algorithm for computing projections of resultant polytopes using mixed subdivisions. It also describes work on edge-skeleton computations, a volume algorithm, 4D resultant polytope combinatorics, and high-dimensional predicate software.
Two-level Just-in-Time Compilation with One Interpreter and One EngineYusuke Izawa
This document proposes a two-level just-in-time compilation approach using one interpreter and one engine. It finds that by providing different interpreter definitions to the RPython meta-tracing compiler, different kinds of compilers and compilations can be derived, such as tracing, method, and threaded code compilers. The key idea is an adaptive RPython system that performs multitier compilation by generating different interpreters from a generic interpreter and driving the RPython engine accordingly. This challenges the assumption in the JIT community that a meta-tracing compiler can only perform tracing compilation.
The document discusses graph modification problems from the perspective of parameterized complexity. It presents results on various fixed parameter tractable graph editing problems, including block graph vertex deletion, which can be solved in FPT time O(4k*n^O(1)). The document also discusses a generalization of graph editing problems called (F1F2...Fa)-editing, and presents hardness results showing simultaneous odd cycle transversal is W[1]-hard with respect to the solution size parameter k.
The document discusses stacks and procedures in assembly language programming. It covers stack implementation using registers and instructions, parameter passing methods using registers or stack, and establishing stack frames using ENTER and LEAVE instructions. Procedures can be called using CALL and control returned using RET. The stack is used for temporary data storage, parameter passing, and storing return addresses for procedures.
1. The document proposes a modified Feistel cipher that involves key based substitution, shifting of rows, key based mixing of columns, and modular arithmetic addition on the plaintext blocks.
2. The plaintext is divided into pairs of square matrices P0 and Q0. These matrices undergo multiple rounds of key based substitution, row shifting, key based mixing, modular addition with the key, and shuffling.
3. Cryptanalysis shows that the cipher is strong against conventional attacks due to the multiple operations applied in each round, particularly the key based substitution and mixing.
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
This document summarizes Yusuke Izawa's master's thesis defense on stack hybridization, a mechanism for bridging two compilation strategies - tracing and method-based - in a meta compiler framework. The proposal extends a meta-tracing just-in-time (JIT) compiler to apply different compilation strategies to different parts of a program based on call context. A proof-of-concept implementation in OCaml showed the hybrid approach was about 1.1x faster than a method-based only approach and over 100x faster than a tracing only approach.
INC and DEC Instructions
ADD Instruction
SUB Instruction
NEG Instruction
Implementing Arithmetic Expressions
Flags Affected by Addition and Subtraction
Example Program (AddSub3)
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...Taegyun Jeon
The document provides an introduction to time series analysis and forecasting using TensorFlow. It discusses various time series models including AR, MA, ARMA, ARIMA and RNN models. It then demonstrates how to implement these models using TensorFlow TimeSeries API, including ARRegressor, LSTM and forecasting on test data. Code examples are provided for data preprocessing, training AR and LSTM models on sample time series data, and making predictions on test data.
Gremlin is the graph traversal language of Apache TinkerPop, an open source graph computing framework, that is implemented by a great many graph databases, including DSE Graph. Even the most novice Gremlin user will recognize the Gremlin statement of "g.V()", but in this presentation we will stop to take a moment to understand the elements of that ubiquitous statement and the elements of the steps that append to it. With the foundational knowledge of "Gremlin's Anatomy" firmly held, we will perform an autopsy on an advanced Gremlin traversal and thus expose techniques for examining and taming the most complex and confusing Gremlin one might come across.
The document discusses several applications of stacks, including evaluating arithmetic expressions in Polish notation without needing operator precedence or parentheses, converting expressions between infix and postfix notation, matching parentheses in expressions, and other applications like reversing strings and generating code from expressions. Polish notation places operators after operands to simplify evaluation using a stack. Converting expressions to postfix form uses a stack to remove parentheses and preserve operator order.
This document proposes techniques for automatically ranking results from database queries. It extends TF-IDF models from information retrieval to databases by developing IDF and QF similarity measures. IDF similarity adapts inverse document frequency to databases by calculating frequency of attribute values. QF similarity leverages query frequency from workload logs. An index-based threshold algorithm is used to efficiently retrieve top-K results by sorting on attribute values.
This document proposes techniques for automatically ranking the results of database queries. It introduces IDF Similarity, which adapts the TF-IDF concept from information retrieval to database attributes by calculating IDF scores. It also introduces QF Similarity, which determines attribute importance based on query frequency in a workload rather than collection frequency. An index-based threshold algorithm is developed to efficiently retrieve the top-K results by exploiting these similarity functions. The algorithm performs sorted and random access to database tuples to iteratively refine the top results.
This document proposes techniques for automatically ranking the results of database queries. It introduces IDF Similarity, which adapts the TF-IDF concept from information retrieval to database attributes by calculating IDF scores. It also introduces QF Similarity, which determines attribute importance based on query frequency in a workload rather than collection frequency. An index-based threshold algorithm is developed to efficiently retrieve the top-K results by exploiting these similarity functions. The algorithm performs sorted and random access to tuples to iteratively refine the top results until a stopping condition is met.
This document proposes techniques for automatically ranking the results of database queries. It introduces IDF Similarity, which adapts the TF-IDF concept from information retrieval to database attributes by calculating IDF scores. It also introduces QF Similarity, which determines attribute importance based on query frequency in a workload rather than collection frequency. An index-based threshold algorithm is developed to efficiently retrieve the top-K results by exploiting these similarity functions. The algorithm performs sorted and random access to database tuples to iteratively refine the top results.
This chapter discusses how youth use new media technologies like mobile phones, instant messaging, and social media in their dating practices and intimacy. It explores how these technologies have changed courtship rituals, allowing youth to get to know each other online before meeting in person, but also how they make breaking up more difficult by leaving digital remnants of past relationships. While new media provides benefits of privacy and easier communication, it can also make youth more vulnerable if too much personal information is shared publicly online.
This document discusses using jQuery and Google App Engine to create cross-domain web mashups in 3 sentences or less:
The document introduces techniques for creating cross-domain web mashups using jQuery to make AJAX calls across domains and Google App Engine for hosting, discussing JSONP and proxies to overcome the same-origin policy limitation. It then provides an example mashup that displays tweets tagged with a hashtag on a map by geocoding hashtag names to locations and querying Twitter, Google Maps, and other domains.
This document proposes techniques for automatically ranking the results of database queries. It introduces IDF Similarity, which adapts the TF-IDF concept from information retrieval to database attributes by calculating IDF scores based on attribute value frequencies. It also introduces QF Similarity, which determines attribute value importance based on frequency in a query workload log. An Index-based Threshold Algorithm is developed to efficiently retrieve the top-K results by exploiting these similarity functions. The algorithm performs sorted and random accesses to tuples to iteratively refine the top results based on a stopping condition.
This document proposes techniques for automatically ranking the results of database queries. It introduces IDF Similarity, which adapts the TF-IDF concept from information retrieval to database attributes by calculating IDF scores based on attribute value frequencies. It also introduces QF Similarity, which determines attribute value importance based on frequency in a query workload log. An Index-based Threshold Algorithm is developed to efficiently retrieve the top-K results by exploiting these similarity functions. The algorithm performs sorted and random accesses to tuples to iteratively refine the top results until a stopping condition is met.
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...ETH Zurich
We present FGen, a program generator for high performance convolution operations (finite-impulse-response filters). The generator uses an internal mathematical DSL to enable structural optimization at a high level of abstraction. We use FGen as a testbed to demonstrate how to provide modular and extensible support for modern SIMD vector architectures in a DSL-based generator. Specifically, we show how to combine staging and generic programming with type classes to abstract over both the data type (real or complex) and the target architecture (e.g., SSE or AVX) when mapping DSL expressions to C code with explicit vector intrinsics. Benchmarks shows that the generated code is highly competitive with commercial libraries.
The document provides an overview of ABAP (Advanced Business Application Programming) programming. It discusses the structure of ABAP programs, data objects, basic statements and features. The key topics covered are:
1. The structure of ABAP programs including reports, dialog programs, and transactions.
2. Data objects in ABAP such as variables, structures, internal tables, and field symbols.
3. Basic statements for data manipulation like MOVE, WRITE, and IF.
4. Features of ABAP such as its independence from operating systems and integrated SQL functionality.
The document provides an overview of ABAP (Advanced Business Application Programming) programming. It outlines the structure of an ABAP course including chapters on list processing, open SQL, event-driven programming, modularization, and debugging. It also describes the basic concepts of ABAP including data types, variables, structures, constants, system fields and statements like MOVE, CLEAR and FIELD-SYMBOLS.
The document provides an overview of ABAP (Advanced Business Application Programming) programming. It outlines the structure of an ABAP course including chapters on list processing, open SQL, event-driven programming, modularization, and debugging. It also describes the basic concepts of ABAP including data types, variables, structures, constants, and system fields.
The document provides an overview of ABAP (Advanced Business Application Programming) programming. It outlines the structure of an ABAP course including chapters on list processing, open SQL, event-driven programming, modularization, and debugging. It also describes the basic features and structure of the ABAP language, including data objects, control statements, and event handling.
The document provides an overview of ABAP (Advanced Business Application Programming) programming. It discusses the structure of ABAP programs, data objects, predefined data types, and other key concepts like structures, constants, system fields, MOVE statement, and CLEAR statement. The outline includes 6 chapters that cover topics like introduction to ABAP, list processing, open SQL, event-driven programming, modularization, and debugging techniques.
The document provides an overview of ABAP (Advanced Business Application Programming) programming. It discusses the structure of ABAP programs, data objects, basic statements and features. The key topics covered are:
1. The structure of ABAP programs including reports, dialog programs and transactions.
2. Data objects in ABAP such as variables, structures, internal tables, constants and field symbols.
3. Basic statements for data manipulation like MOVE, WRITE, IF and LOOP.
4. Features of ABAP like its independence from operating systems, subset of SQL called Open SQL and event-driven programming.
Kursi programimit në gjuhën R fillon nga zero njohuri por disa njohuri bazë në cilëndo gjuhë do tju ndihmonin për përfitim maksimal të kursit! Gjuha e programimit R nuk cilësohet si gjuhë e vështirë, gjuha R përdoret për kuptimin, interpretimin dhe vizualizimin e të dhënave. Duke qënë se kompanitë ose institucione kërkimore mbledhin gjithmonë e më shumë të dhëna dhe gjithmonë e më komplekse, gjuha R është gjuha e zgjedhur për të analizuar të dhënat. Gjuha R është më e mira për analizë, vizualizim të dhënat, logaritme shkencore dhe machine learning. Studentët do të vënë re që me udhëzimin e duhur nga instruktorët tanë, ekspert senior në programim, vetë programimi nuk është edhe aq i vështirë, por është i strukturuar mirë dhe logjik. Trajnimi është 100% në PRAKTIKE. Mësimi më i mirë është praktika!
Category theory concepts such as objects, arrows, and composition directly map to concepts in Scala. Objects represent types, arrows represent functions between types, and composition represents function composition. Scala examples demonstrate how category theory diagrams commute, with projection functions mapping to tuple accessors. Thinking in terms of interfaces and duality enriches both category theory and programming language concepts. Learning category theory provides a uniform way to reason about programming language structures and properties of data types.
Category theory concepts such as objects, arrows, and composition map nicely to structures in Scala. Functions in Scala represent arrows between types. Composition allows combining functions. Category theory diagrams illustrate relationships between types and functions through commutative diagrams. For example, product types in category theory correspond to tuples in Scala, with projection functions representing the arrows. Learning category theory provides insights into abstraction and mathematical properties underlying programming concepts.
Perl and Haskell: Can the Twain Ever Meet? (tl;dr: yes)Wim Vanderbauwhede
This talk is about two Perl modules (Call:Haskell and Functional::Types) I developed to call Haskell functions as transparently as possible.
In general, the only way to guarantee the correctness of the types of the function arguments in Haskell is to ensure they are well-typed in Perl. So I ended up writing a Haskell-inspired type system for Perl. In this talk I will first discuss the approach I took to call Haskell from Perl, and then the reasons why a type system is needed, and the actual type system I developed. The type system is based on "prototypes", functions that create type descriptors, and a small API of functions to create type constructors and manipulate the types. The system is type checked at run time and supports sum types, product types, function types and polymorphism. The approach is not Perl-specific and suitable for other dynamic languages.
https://github.com/wimvanderbauwhede
Optimizing with persistent data structures (LLVM Cauldron 2016)Igalia
By Andy Wingo.
Is there life beyond phi variables and basic blocks? Andy will report on his experience using a new intermediate representation for compiler middle-ends, "CPS soup". The CPS soup language represents programs using Clojure-inspired maps, allowing optimizations to be neatly expressed as functions from one graph to another. Using these persistent data structures also means that the source program doesn't change while the residual program is being created, eliminating one class of problems that the optimizer writer has to keep in mind. Together we will look at some example transformations from an expressiveness as well as a performance point of view, and we will also cover some advantages which a traditional SSA graph maintains over CPS soup.
This document describes a K-Map software tool that simplifies Boolean equations. The tool reads in a Boolean expression with up to 4 variables in sum-of-products or product-of-sums form, generates a Karnaugh map, and uses it to minimize the expression. Algorithms are provided for solving 2, 3, and 4 variable maps. The tool could aid in designing sequential circuits and simplifying expressions frequently in other applications. Its use of different input forms and deductive reasoning achieves simplified output.
The document discusses code generation techniques in compiler construction. It describes generating executable code from source code by using intermediate representations like three-address code and P-code. It covers generating code from syntax trees, implementing intermediate codes using data structures, and translating between different intermediate representations and target machine code.
The document discusses functions in Scala. It covers basic syntax including parameter types, recursive functions, and default arguments. It also discusses functions as values that can be passed as arguments or returned from other functions. Generic functions and type parameters are explained. The document also covers closures where functions can access variables from outer scopes, and partial application, currying, and function composition.
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...Data Con LA
Debugging data processing logic in Data-Intensive Scalable Computing (DISC) systems is a difficult and time consuming effort. To aid this effort, we built Titian, a library that enables data provenance tracking data through transformations in Apache Spark.
2. Contents
Introduction
IDF Similarity
QF Similarity
Breaking Ties
Implementation
ITA Algorithm
Conclusion
3. Introduction
Database is Boolean Query Model
E.g.. Select * WHERE MFR_Country = “Germany”
AND Type = “Sports” AND Manufacture =
“Volkswagon”
Problems in Database
Empty Answers
Too selective query leading to Null Result Set
Many Answers
General query leading to too many results
4. Introduction
Ranking of Database Query Results using IR
techniques.
Applying TF-IDF concept to database that is
based on the frequency of the attribute values.
Need to extend the TF-IDF to Numerical Domains
IDF Similarity is discussed in paper
Collecting WORKLOAD and using it for ranking.
QF Similarity, leveraging Workload Information
5. Introduction
Many Answers Problem is solved using Top-K
Query Processing
Index-based Threshold Algorithm (ITA)
developed exploiting IDF/QF Similarity.
6. IDF Similarity
What is TF-IDF Technique?
Given a set of documents and a query,
documents are ranked based on TF and IDF of
the words of the document.
Adapting IDF concept to Database
containing only categorical Attributes
t=<t1,……tm> values of Attribute A
n Number of tuples in the database
7. IDF Similarity
For all the values of t:
Frequency F(t) is defined as no. of tuples having
Attribute A = t
IDF is calculated as:
IDF(t) = log(n/F(t))
For pair of values u and v in Attribute A domain
S(u,v) = IDF (u) if u=v otherwise 0
For tuple T and Query Q for all the Attributes
(A1…Ak) m
SIM(T,Q) = S ( t , q )
k k k
k 1
8. IDF Similarity
Example:
CAR_ID MODEL MFR MFR_Country Type
1 SLR Mercedes Germany Sports
2 A6 Audi Germany Executive
3 R8 Audi Germany Sports
4 Gallardo Lamborghini Italy Sports
Query Q: Select * WHERE MFR_Country =
“Germany” AND Type = “Sports” AND MFR =
“Volkswagon”
10. IDF Similarity
Consider a Numeric Attribute in DB e.g. PRICE
SIMPLE SOLUTION: Discretize the data between ranges
Consider two Range: (0, 50) and (51, 100)
Values 49 and 52 are considered completely dissimilar.
Frequency of a numeric value t of an attribute is defined as
2
ti t
n 1/ 2
h
sum of contributions to t
F(t) = e from every ti database.
i
IDF(t) = log(n/F(t)) h = bandwidth parameter
S(t,q) = density at t of a Gaussian Distribution centered q.
2
ti t
1/ 2
h
S(t,q) = e IDF ( q )
11. IDF Similarity
Consider following Query:
Select * where MFR IN (“Germany”, “Italy”,
”Japan”) m
SIM(T,Q) = max S k ( t k , q )
q Qk
k 1
12. QF Similarity
Problems with IDF:
In a realtor database, more homes are built in
recent years such as 2007 and 2008 as
compared to 1980 and 1981.Thus recent years
have small IDF. Yet newer homes have higher
demand.
In a bookstore DB, demand for an author is due
to factor other than no. of books he has written
13. QF Similarity
WORKLOAD: Past Queries
Importance of attribute values is determined
by frequency of their occurrence in workload.
As in above eg, frequency of queries
requesting homes in 2010 are more than of
the year 1981
14. QF Similarity
For categorical data
RQF(q) = raw frequency of occurrence of value q of
attribute A in query strings of workload
RQFMax = raw frequency of most frequently occurring
value in workload
Query frequency QF(q) = RQF(q)/RQFMax
s(t, q) = QF(q), if q = t otherwise 0
QF resembles TF
16. QF Similarity
Similarity between pairs of different categorical
attribute values can also be derived from workload
eg. To find S(Audi, Mercedes)
Similarity coefficient between t and q in this case is
defined by jaccard coefficient scaled by QF factor
as shown below.
S(t,q)=J(W(t),W(q))/QF(q)
W(t) = Subset of queries in workload W in which
categorical value t occurs in an IN clause
17. QF-IDF
For QF-IDF Similarity
S(t,q)=QF(q) *IDF(q) when t=q otherwise 0
18. BREAKING TIES
IF SIM(t1, q) = SIM (t2, q)
Which Should be ranked Higher??
QF and IDF partitions database into classes
CAR_ID MODEL MFR MFR_Country Type
1 SLR Mercedes Germany Sports
2 A6 Audi Germany Executive
3 R8 Audi Germany Sports
4 Gallardo Lamborghini Italy Sports
Q: SELECT * WHERE Type = “Sports” AND MFR_Country
= “Germany”
19. Breaking Ties with QF
Determine weights of missing attribute values that
reflect their “global importance” using workload.
Global Imp = log( QF ( t k )) tk= missing attribute
k
Missing Attributes for Q: MFR and Model
20. Breaking Ties with QF
Considering Workload with following values of MFR and
Model
MFR{Audi, Audi, Lamborghini, Mercedes, Lamborghini, Audi}
Model{R8, A6, Gallardo, SLR, Gallardo, A6}
QF(SLR) = ½ = 0.5 QF(Mercedes) = 1/3 = 0.33
1 SLR Mercedes Germany Sports
Global Imp = log(0.5) + log(0.33).
NEGATIVE VALUES of Global Imp ??
21. Breaking Ties with IDF
Tuples with large IDF(occuring infequently) of
missing attributes are ranked higher
Cars which are not popular are ranked higher
Tuples with small IDF of missing attributes
are ranked higher
Cars having Moonroof will be ranked less which
is a desirable feature.
23. Implementation
Pre Processing Component
Compute and store a representation of similarity
function(QF-IDF, QF, IDF) in auxiliary database
tables
24. Implementation
Query Processing Component
Job: Retrieving Top-K results from Database
ITA Algorithm: Use of Fagin’s Threshold Algorithm
and Similarity function
Sorted Access: Along any attribute Ak, TIDs of tuples
are retrieved.
Random Access: entire tuple corresponding to a TID
is retrieved.
25. ITA Algorithm
Repeat
Initialize Top-K Buffer to empty
For each k = 1 to p
TID = Index of the next Tuple is retrieved from the ordered
Lists
T = Complete Tuple is retrieved for TID
Compute value of Ranking Function
If Rank of T is higher than the rank of lowest ranking tuple in
Top-K Buffer, then update Top-K Buffer
If Stopping Condition has been reached then Exit
End For
Until all index of the tuples have been seen.
26. ITA Algorithm
Stopping Condition
Hypothetical tuple – current value a1,…, ap
for A1,… Ap, corresponding to index seeks on
L1,…, Lp and qp+1,….. qm for remaining
columns from the query directly.
Termination – Similarity of hypothetical tuple
to the query< tuple in Top-k buffer with least
similarity.
27. ITA for Numeric columns
Consider a query has condition Ak = qk for a
numeric column Ak.
Two index scan is performed on Ak.
First retrieve TID’s > qk in incresing order.
Second retrieve TID’s < qk in decreasing order.
We then pick TID’s from the merged stream.
28. Conclusion
Automated Ranking Infrastructure for SQL
databases.
Extended TF-IDF based techniques from
Information retrieval to numeric and mixed
data.
Implementation of Ranking function that
exploited Fagin’s TA