This document discusses object-relational mapping (ORM) frameworks and the ADO.NET Entity Framework architecture specifically. It provides an overview of ORM and why it was developed, describes the key components of the Entity Framework including its programming model and mapping tools. It then discusses how the Entity Framework compiles mappings to generate bidirectional query and update views between the object and relational models to enable updating the database from object changes. The document evaluates the performance of the Entity Framework's mapping compiler.
Optimizer is the component of the DB2 SQL compiler responsible for selecting an optimal access plan for an SQL statement. The optimizer works by calculating the execution cost of many alternative access plans, and then choosing the one with the minimal estimated cost. Understanding how the optimizer works and knowing how to influence its behaviour can lead to improved query performance and better resource usage.
This presentation was created for the workshop delivered at the CASCON 2011 conference. Its aim is to introduce basic optimizer and related concepts, and to serve as a starting point for further study of the optimizer techniques.
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Data Con LA
"R is the most popular language in the data-science community with 2+ million users and 6000+ R packages. R’s adoption evolved along with its easy-to-use statistical language, graphics, packages, tools and active community. In this session we will introduce Distributed R, a new open-source technology that solves the scalability and performance limitations of vanilla R. Since R is single-threaded and does not scale to accommodate large datasets, Distributed R addresses many of R’s limitations. Distributed R efficiently shares sparse structured data, leverages multi-cores, and dynamically partitions data to mitigate load imbalance.
In this talk, we will show the promise of this approach by demonstrating how important machine learning and graph algorithms can be expressed in a single framework and are substantially faster under Distributed R. Additionally, we will show how Distributed R complements Vertica, a state-of-the-art columnar analytics database, to deliver a full-cycle, fully integrated, data “prep-analyze-deploy” solution."
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
Catalyst is becoming one of the most important components of Apache Spark, as it underpins all the major new APIs in Spark 2.0 and later versions, from DataFrames and Datasets to Streaming. At its core, Catalyst is a general library for manipulating trees.
In this talk, Yin explores a modular compiler frontend for Spark based on this library that includes a query analyzer, optimizer, and an execution planner. Yin offers a deep dive into Spark SQL’s Catalyst optimizer, introducing the core concepts of Catalyst and demonstrating how developers can extend it. You’ll leave with a deeper understanding of how Spark analyzes, optimizes, and plans a user’s query.
Optimizer is the component of the DB2 SQL compiler responsible for selecting an optimal access plan for an SQL statement. The optimizer works by calculating the execution cost of many alternative access plans, and then choosing the one with the minimal estimated cost. Understanding how the optimizer works and knowing how to influence its behaviour can lead to improved query performance and better resource usage.
This presentation was created for the workshop delivered at the CASCON 2011 conference. Its aim is to introduce basic optimizer and related concepts, and to serve as a starting point for further study of the optimizer techniques.
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Data Con LA
"R is the most popular language in the data-science community with 2+ million users and 6000+ R packages. R’s adoption evolved along with its easy-to-use statistical language, graphics, packages, tools and active community. In this session we will introduce Distributed R, a new open-source technology that solves the scalability and performance limitations of vanilla R. Since R is single-threaded and does not scale to accommodate large datasets, Distributed R addresses many of R’s limitations. Distributed R efficiently shares sparse structured data, leverages multi-cores, and dynamically partitions data to mitigate load imbalance.
In this talk, we will show the promise of this approach by demonstrating how important machine learning and graph algorithms can be expressed in a single framework and are substantially faster under Distributed R. Additionally, we will show how Distributed R complements Vertica, a state-of-the-art columnar analytics database, to deliver a full-cycle, fully integrated, data “prep-analyze-deploy” solution."
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
Catalyst is becoming one of the most important components of Apache Spark, as it underpins all the major new APIs in Spark 2.0 and later versions, from DataFrames and Datasets to Streaming. At its core, Catalyst is a general library for manipulating trees.
In this talk, Yin explores a modular compiler frontend for Spark based on this library that includes a query analyzer, optimizer, and an execution planner. Yin offers a deep dive into Spark SQL’s Catalyst optimizer, introducing the core concepts of Catalyst and demonstrating how developers can extend it. You’ll leave with a deeper understanding of how Spark analyzes, optimizes, and plans a user’s query.
Access Data from XPages with the Relational ControlsTeamstudio
Did you know that Domino and XPages allows for the easy access of relational data? These exciting capabilities in the Extension Library can greatly enhance the capability of your applications and allow access to information beyond Domino. Howard and Paul will discuss what you need to get started, what controls allow access to relational data, and the new @Functions available to incorporate relational data in your Server Side JavaScript programming.
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
SQL is undoubtedly the most widely used language for data analytics. It is declarative and can be optimized and efficiently executed by most query processors. Therefore the community has made effort to add relational APIs to Apache Flink, a standard SQL API and a language-integrated Table API. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite. Since Flink supports both stream and batch processing and many use cases require both kinds of processing, we aim for a unified relational layer. In this talk we will look at the current API capabilities, find out what's under the hood of Flink’s relational APIs, and give an outlook for future features such as dynamic tables, Flink's way how streams are converted into tables and vice versa leveraging the stream-table duality.
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...Insight Technology, Inc.
Migrating Oracle based applications to MariaDB has become easier and economically advantageous with the feature set of MariaDB 10.2 and the upcoming 10.3 release. We’ll present details of the features that led DBS Bank to migrate mission critical applications to MariaDB.
SparkSQL: A Compiler from Queries to RDDsDatabricks
SparkSQL, a module for processing structured data in Spark, is one of the fastest SQL on Hadoop systems in the world. This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will walk away with a deeper understanding of how Spark analyzes, optimizes, plans and executes a user’s query.
Speaker: Sameer Agarwal
This talk was originally presented at Spark Summit East 2017.
• A competent professional with 3.5 years of experience in Data warehousing and Investment Banking Domain.
• Expertise in end-to-end implementation of various projects including designing, development, coding, implementation of software applications.
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
Reference: Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...Piyush Kumar
- Why organization(s) need FeatureStore to remove complex pipeline jungles and to simplify Machine Learning workflows
- How we developed MetaConfig driven FeatureStore @MakeMyTrip, Architecture
- Prediction Serving infrastructure for online & batch
Access Data from XPages with the Relational ControlsTeamstudio
Did you know that Domino and XPages allows for the easy access of relational data? These exciting capabilities in the Extension Library can greatly enhance the capability of your applications and allow access to information beyond Domino. Howard and Paul will discuss what you need to get started, what controls allow access to relational data, and the new @Functions available to incorporate relational data in your Server Side JavaScript programming.
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
SQL is undoubtedly the most widely used language for data analytics. It is declarative and can be optimized and efficiently executed by most query processors. Therefore the community has made effort to add relational APIs to Apache Flink, a standard SQL API and a language-integrated Table API. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite. Since Flink supports both stream and batch processing and many use cases require both kinds of processing, we aim for a unified relational layer. In this talk we will look at the current API capabilities, find out what's under the hood of Flink’s relational APIs, and give an outlook for future features such as dynamic tables, Flink's way how streams are converted into tables and vice versa leveraging the stream-table duality.
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...Insight Technology, Inc.
Migrating Oracle based applications to MariaDB has become easier and economically advantageous with the feature set of MariaDB 10.2 and the upcoming 10.3 release. We’ll present details of the features that led DBS Bank to migrate mission critical applications to MariaDB.
SparkSQL: A Compiler from Queries to RDDsDatabricks
SparkSQL, a module for processing structured data in Spark, is one of the fastest SQL on Hadoop systems in the world. This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will walk away with a deeper understanding of how Spark analyzes, optimizes, plans and executes a user’s query.
Speaker: Sameer Agarwal
This talk was originally presented at Spark Summit East 2017.
• A competent professional with 3.5 years of experience in Data warehousing and Investment Banking Domain.
• Expertise in end-to-end implementation of various projects including designing, development, coding, implementation of software applications.
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
Reference: Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...Piyush Kumar
- Why organization(s) need FeatureStore to remove complex pipeline jungles and to simplify Machine Learning workflows
- How we developed MetaConfig driven FeatureStore @MakeMyTrip, Architecture
- Prediction Serving infrastructure for online & batch
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Online aptitude test management system project report.pdfKamal Acharya
The purpose of on-line aptitude test system is to take online test in an efficient manner and no time wasting for checking the paper. The main objective of on-line aptitude test system is to efficiently evaluate the candidate thoroughly through a fully automated system that not only saves lot of time but also gives fast results. For students they give papers according to their convenience and time and there is no need of using extra thing like paper, pen etc. This can be used in educational institutions as well as in corporate world. Can be used anywhere any time as it is a web based application (user Location doesn’t matter). No restriction that examiner has to be present when the candidate takes the test.
Every time when lecturers/professors need to conduct examinations they have to sit down think about the questions and then create a whole new set of questions for each and every exam. In some cases the professor may want to give an open book online exam that is the student can take the exam any time anywhere, but the student might have to answer the questions in a limited time period. The professor may want to change the sequence of questions for every student. The problem that a student has is whenever a date for the exam is declared the student has to take it and there is no way he can take it at some other time. This project will create an interface for the examiner to create and store questions in a repository. It will also create an interface for the student to take examinations at his convenience and the questions and/or exams may be timed. Thereby creating an application which can be used by examiners and examinee’s simultaneously.
Examination System is very useful for Teachers/Professors. As in the teaching profession, you are responsible for writing question papers. In the conventional method, you write the question paper on paper, keep question papers separate from answers and all this information you have to keep in a locker to avoid unauthorized access. Using the Examination System you can create a question paper and everything will be written to a single exam file in encrypted format. You can set the General and Administrator password to avoid unauthorized access to your question paper. Every time you start the examination, the program shuffles all the questions and selects them randomly from the database, which reduces the chances of memorizing the questions.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
An Approach to Detecting Writing Styles Based on Clustering Techniquesambekarshweta25
An Approach to Detecting Writing Styles Based on Clustering Techniques
Authors:
-Devkinandan Jagtap
-Shweta Ambekar
-Harshit Singh
-Nakul Sharma (Assistant Professor)
Institution:
VIIT Pune, India
Abstract:
This paper proposes a system to differentiate between human-generated and AI-generated texts using stylometric analysis. The system analyzes text files and classifies writing styles by employing various clustering algorithms, such as k-means, k-means++, hierarchical, and DBSCAN. The effectiveness of these algorithms is measured using silhouette scores. The system successfully identifies distinct writing styles within documents, demonstrating its potential for plagiarism detection.
Introduction:
Stylometry, the study of linguistic and structural features in texts, is used for tasks like plagiarism detection, genre separation, and author verification. This paper leverages stylometric analysis to identify different writing styles and improve plagiarism detection methods.
Methodology:
The system includes data collection, preprocessing, feature extraction, dimensional reduction, machine learning models for clustering, and performance comparison using silhouette scores. Feature extraction focuses on lexical features, vocabulary richness, and readability scores. The study uses a small dataset of texts from various authors and employs algorithms like k-means, k-means++, hierarchical clustering, and DBSCAN for clustering.
Results:
Experiments show that the system effectively identifies writing styles, with silhouette scores indicating reasonable to strong clustering when k=2. As the number of clusters increases, the silhouette scores decrease, indicating a drop in accuracy. K-means and k-means++ perform similarly, while hierarchical clustering is less optimized.
Conclusion and Future Work:
The system works well for distinguishing writing styles with two clusters but becomes less accurate as the number of clusters increases. Future research could focus on adding more parameters and optimizing the methodology to improve accuracy with higher cluster values. This system can enhance existing plagiarism detection tools, especially in academic settings.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
2. Compiling Mappings to Bridge
Applications and Databases
- Sergey Melnik, Atul Adya, Philip A. Bernstei
Anatomy of the ADO .NET Entity
Framework
- Atul Adya, José A. Blakeley, Sergey Melnik, S.
Muralidhar, and the ADO.NET Team
Papers
2
3. What is ORM??
• A methodology for object oriented systems to
hold data in database, with transactional
control and yet express it as program objects
when needed
• Avoid bundles of special code
• Essential for multilayered database
applications
3
4. Why ORM ?
• Impedance mismatch between programming
language abstractions and persistent storage
• Data independence i.e., data representation can
evolve irrespective of the layer
• Independent of DBMS vendor
• Bridge between application and database
4
5. Layered Database Application
Presentation Layer
User Interface
Service Layer
Transactions in terms
of objects
Data Access layer
ORM functionality
Data expressed in
Object domain
Database
5
6. Sample Relation Schema
SSalesPersons SSalesOrders
SEmployees SContacts
create table SContacts(ContactId int primary key,
Name varchar(100),
Email varchar(100),
Phone varchar(10));
create table SEmployees( EmployeeId int primary key references SContacts(ContactId),
Title varchar(20),
HireDate date);
create table SSalesPersons(SalesPersonId int primary key references
SEmployees(EmployeeId),
Bonus int);
create table SSalesOrder(SalesOrderId int primary key,
SalesPersonId int references SSalesPersons(SalesPersonId));
6
7. Traditional Embedded Data Access Queries
void EmpsByDate(DateTime date) {
using( SqlConnection con = new SqlConnection (CONN_STRING) ) {
con.Open();
SqlCommand cmd = con.CreateCommand();
cmd.CommandText = @"
SELECT SalesPersonID, FirstName, HireDate
FROM SSalesPersons sp
INNER JOIN SEmployees e
ON sp.SalesPersonID = e.EmployeeID
INNER JOIN SContacts c
ON e.EmployeeID = c.ContactID
WHERE e.HireDate < @date";
cmd.Parameters.AddWithValue("@date",date);
DbDataReader r = cmd.ExecuteReader();
while(r.Read()) {
Console.WriteLine("{0:d}:t{1}", r["HireDate"],
r["FirstName"]);
} } }
7
8. Entity SQL
void EmpsByDate (DateTime date) {
using( EntityConnection con =
new EntityConnection (CONN_STRING) ) {
con.Open();
EntityCommand cmd = con.CreateCommand();
cmd.CommandText = @"
SELECT VALUE sp FROM ESalesPersons sp
WHERE sp.HireDate < @date";
cmd.Parameters.AddWithValue ("@date", date);
DbDataReader r = cmd.ExecuteReader();
while (r.Read()) {
Console.WriteLine("{0:d}:t{1}", r["HireDate"], r["FirstName"])
} } }
8
9. LINQ
void EmpsByDate(DateTime date) {
using (AdventureWorksDB aw =
new AdventureWorksDB()) {
var people = from p in aw.SalesPersons
where p.HireDate < date
select p;
foreach (SalesPerson p in people) {
Console.WriteLine("{0:d}t{1}", p.HireDate,
p.FirstName );
} } }
9
10. O/R mismatch - Improvements
• 1980s: Persistent programming languages
- One or two commercial products
• 1990s: OODBMS
- No widespread acceptance
• "Objects & Databases: A Decade in Turmoil"
- Carey & DeWitt (VLDB'96), bet on ORDBMS
• 2000: ORDBMS go mainstream
- DB2 & Oracle implement hardwired O/R mapping
- O/R features rarely used for business data
• 2002: client-side data mapping layers
• Today: ORM Frameworks – ADO .NET EDM Framework,
hibernate, JPA, Toplink, etc.
10
12. Components of the Framework
• Data Source providers
-Provides data to EDM Layer services from
data sources
-Support for different types of sources
• Entity Data Services
-EDM
-Metadata services
• Programming Layers
• Domain Modeling Tools
-tools for schema generation, creating
mapping fragments
12
13. Object Services
• .NET CLR
-Common Language runtime
- allows any program in .NET language to
interact with Entity Framework
• Database connection, metadata
• Object State Manager
-Tracks in-memory changes
- construct the change list input to the
processing infrastructure
• Object materializer
- Transformations during query and update views
between entity values from the conceptual layer
and corresponding CLR Objects
13
14. Interacting with Data in EDM Framework
• Entity SQL
- Derived from standard SQL
- with capabilities to manipulate EDM instances
• LINQ
-Language-integrated query
- Expressions of the programming language itself
-Supported in MS programming languages(VB, C#)
•CRUD
- Create, Read, Update and Delete operations on
objects
14
15. Domain modeling Tools
Some of the design time tools included in the framework
• Model designer
-Used to define the conceptual model interactively
- generate and consume model descriptions
- Synthesize EDM models from relational metadata
• Mapping Designer
- conceptual model to the relational database map
-This map is the input to the mapping compilation
which generates the query and update views
• Code generation
- Set of tools to generate CLR classes for the entity
types
15
16. Query Pipeline
• Breaks down Entity SQL or LINQ query into one or
more elementary, relational-only queries that can
be evaluated by the underlying data store
Steps in query Processing
• Syntax & Semantic analysis
- Parsed, analyzed using Metadata services
component
• Conversion to a canonical Command Tree
- Converted to Optimized tree
• Mapping view Unfolding
- Translated to reference the underlying db
tables
16
17. Steps Contd.
• Structured Type Elimination
- References to structured data(ancestor, constructors)
• Projection Pruning
- Elimination of unreferenced expressions
• Nest Pull-up
- Nested query is bubbled to the top
• Transformations
- Redundant operations are eliminated by pushing down
other operators
• Translation to Provider Specific Commands
• Command Execution
• Result Assembly
• Object Materializaton
- Results are materialized into appropriate programming
language objects
17
18. Special Features of the Framework
• Allows higher level of abstraction than
relational model
• Leverages on the .NET data provider model
• Allows data centric services like reporting on
top of the conceptual model
• Together with LINQ reduces impedance
mismatch significantly
18
20. Bidirectional views
• Mappings relate entities with relations
• Mappings together with the database are
compiled into views
• Drives the runtime engine
• Speeds up mapping translation
• Updates on view are enforced using update
translation techniques
20
21. Bidirectional View Generation
• Query View
- Express entities in terms of tables
• Update Views
-Express tables in terms of entities
Entities = QueryViews(Tables)
Tables = UpdateViews(Entities)
Entities = QueryViews(UpdateViews(Entities))
This ensures entity can be persisted and re-
asssembled from db in a lossless manner
21
22. Compiler Mapping
- Mapping is specified using a set of mapping
fragments
- Each fragment is of the form QEntities = QTables
22
23. Query & Update views
To reassemble Persons from relational tables
23
27. Steps in Update Translation:
• Change list Generation
-List of changes per entity set is created
- Represented as lists of deleted and inserted
elements
• Value Expression Propagation
- Transforms the list of changes obtained from
view maintenance into sequence of algebraic
base table insert and delete expressions against
the underlying affected tables
27
28. Steps in Update Translation(cont’d):
• Stored Procedure Calls Generation
-Produces the final sequence SQL statements on
relational schema (INSERT, DELETE, UPDATE)
• Cache Synchronization
- After updates, the cache state is synchronized
with the new db state
28
29. Update translation Example – Update query
using(AdventureWorksDB aw = new AdventureWorksDB()) {
// People hired more than 5 years ago
var people = from p in aw.SalesPeople
where p.HireDate <
DateTime.Today.AddYears(-5) select p;
foreach(SalesPerson p in people) {
if(HRWebService.ReadyForPromotion(p)) {
p.Bonus += 10;
p.Title = "Senior Sales Representative";
} }
aw.SaveChanges();
}
29
30. Update Translation – Value Expressions
BEGIN TRANSACTION
UPDATE [dbo].[SSalesPersons] SET [Bonus]=30
WHERE [SalesPersonID]=1
UPDATE [dbo].[SSEmployees] SET [Title]= N'Senior Sales
Representative'
WHERE [EmployeeID]=1
END TRANSACTION
∆SSalesPersons= SELECT p.Id, p.Bonus
FROM ∆ESalesPersons As p
∆Semployees = SELECT p.Id, p.Title
FROM ∆ESalesPersons AS p
∆SContacts = SELECT p.Id, p.Name, p.Contact.Email,
p.Contact.Phone FROM ∆ESalesPersons AS p
30
31. Mapping Compilation problem
• Improper proper specification of Mapping
fragments will lead to the mapping not satisfying the
Data Round-tripping Criterions
map ◦ map-1 = Id(C)
•Application developers cannot be entrusted with
task of checking for Data round-tripping criterion
• Hence Mapping Compilation has to done by EDM
model
31
32. Bipartite Mappings
Mapping fragments are defined as follows:
∑map = { Qc1 = Qs1, .. , Qcn = Qsn }
where Qc is the query over the client schema and
Qs is the query over store schema
Thus, ∑map = f ◦ g’
Where the view f: C V
view g: S V
32
33. View Generation & Mapping Compilation
1. Subdivide the mapping into independent set of
fragments
2. Perform mapping validation by checking the
condition Range(f) ⊆ Range(g)
3. Partition the entity set based on mapping
constraints
4. Compile the relevant mappings on each partition
5. Regroup the generated views
6. Eliminate unnecessary self joins
33
34. Paritioning Scheme
procedure PartitionVertically(p, Tp,map)
Part := ∅ // start with an empty set of partitions
for each type T that is derived from or equal to Tp do
P := {σp IS OF (ONLY T)}
for each direct or inherited member A of T do
if map contains a condition on p.A then
if p.A is of primitive type then
P := P × Dom(p.A, map)
else if p.A is of complex type TA then
P := P × PartitionVertically(p.A, TA,map)
end if
end for
Part := Part ∪ P
end for
return Part
34
35. Role of Dom(p, map)
Suppose the mapping constraints contain conditions,
(p=1) and (p IS NOT NULL) on path p of type integer
cond1 := (p=1)
cond2 := (p IS NULL)
cond3 := NOT (p=1 OR p IS NULL)
Every pair of conditions in Dom(p, map) is mutually exclusive
conditions
35
36. Partitioning Example
Above schema and BillingAddr is nullable property with complex type Address.
Type Address has subtype USAddress
P1 : σe IS OF (ONLY Person)
P2 : σe IS OF (ONLY Customer) AND e.BillingAddr IS NULL
P3 : σe IS OF (ONLY Customer) AND e.BillingAddr IS OF (ONLY Address)
P4 : σe IS OF (ONLY Customer) AND e.BillingAddr IS OF (ONLY USAddress)
P5 : σe IS OF (ONLY Employee)
36
40. Grouping Partitioned views
The entire entity set is obtained by grouping views using Ua, ⋈, ⊐⋈
∪a - denotes union without duplicate elimination
40
41. Evaluation
Experimental evaluation of the Entity framework was
done focusing on mapping compiler for the following
parameters
Correctness:
Using automated suite, thousands of mappings was
generated by varying some objects. The compiled
views are verified by deploying the entire data access
stack to query and update sample databases.
41
42. Evaluation (cont’d)
Efficiency:
- Compiling the independent mapping fragments
on partitions alone takes exponential time.
- Recovering partitions from views takes O(n log n )
- All other steps take O(n) time
- The number of independent fragments were
less
- So, the few second delay at start time and
restarts was acceptable
42
43. Evaluation (contd)
Performance:
- Mapping compilation anchors both client-side
rewriting and server-side execution
- Implied constraints were used fully to
generate simplified views
-Major overheads: object instantiation, caching,
query manipulations and delta computation for
updates
- These overheads dominated only for small
datasets
43
44. • Declarative mapping language
-Allows non-expert users to specify
complex O/R mappings
-Formal semantics
• Mechanism for updatable views
- Large class of updates, not O/R specific
- Leverages view maintenance technology
Contributions
Mapping
compile
Bidirectional
views
• Mapping compilation
- Guarantees correctness
44