James Colby Maddox Business Intellignece and Computer Science Portfoliocolbydaman
This portfolio covers the business intelligence course work I have completed at Set Focus, and some of the course work I have completed at Kennesaw State University
James Colby Maddox Business Intellignece and Computer Science Portfoliocolbydaman
This portfolio covers the business intelligence course work I have completed at Set Focus, and some of the course work I have completed at Kennesaw State University
Réussir la transition énergétique - Présentation pour CapitalatWork au Châtea...bernardcarnoy
Comment concilier énergie et environnement ou comment aller vers une économie bas carbone sans crise majeure?
Présentation pour CapitalatWork au Château Saint-Anne Auderghem 10/10/16
Routines et improvisation, à propos de Computation and Human Experience de Ph...Alexandre Monnin
2nde séance de l'atelier de lecture transdisciplinaire UCA JEDI organisé par Alexandre Monnin et Manuel Boutet. Présentation de textes de Phil Agre par Bernard Conein et discussion d'Alexandre Monnin.
John Park, Offering Manager, for IBM Cloud Data Services covers the touchstones for tomorrow’s information systems: data and integration. Stovepipe applications are no longer acceptable, and siloed data sources must evolve and open up to the full enterprise. All this in an environment where more is expected faster, and at a lower cost. If your GIS doesn’t watch out, it will be replaced by less capable alternatives that “fit better” into mainstream IT. But dashDB, a cloud-native offspring of DB2, can provide a bridge that keeps both sides happy. This session introduce this popular cloud data warehousing solution and illustrate how it works in concert with ArcGIS. You will learn about the built-in geospatial functions in dashDB and how you can easily use them to build applications rapidly. You’ll see an application that uses weather data and mobile application data to calculate insurance risk, detect potential fraud, and prevent damage.
Réussir la transition énergétique - Présentation pour CapitalatWork au Châtea...bernardcarnoy
Comment concilier énergie et environnement ou comment aller vers une économie bas carbone sans crise majeure?
Présentation pour CapitalatWork au Château Saint-Anne Auderghem 10/10/16
Routines et improvisation, à propos de Computation and Human Experience de Ph...Alexandre Monnin
2nde séance de l'atelier de lecture transdisciplinaire UCA JEDI organisé par Alexandre Monnin et Manuel Boutet. Présentation de textes de Phil Agre par Bernard Conein et discussion d'Alexandre Monnin.
John Park, Offering Manager, for IBM Cloud Data Services covers the touchstones for tomorrow’s information systems: data and integration. Stovepipe applications are no longer acceptable, and siloed data sources must evolve and open up to the full enterprise. All this in an environment where more is expected faster, and at a lower cost. If your GIS doesn’t watch out, it will be replaced by less capable alternatives that “fit better” into mainstream IT. But dashDB, a cloud-native offspring of DB2, can provide a bridge that keeps both sides happy. This session introduce this popular cloud data warehousing solution and illustrate how it works in concert with ArcGIS. You will learn about the built-in geospatial functions in dashDB and how you can easily use them to build applications rapidly. You’ll see an application that uses weather data and mobile application data to calculate insurance risk, detect potential fraud, and prevent damage.
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
An informational, or statistical, constraint is a constraint such as a unique, primary key, foreign key, or check constraint that can be used by Apache Spark to improve query performance. Informational constraints are not enforced by the Spark SQL engine; rather, they are used by Catalyst to optimize the query processing. Informational constraints will be primarily targeted to applications that load and analyze data that originated from a data warehouse. For such applications, the conditions for a given constraint are known to be true, so the constraint does not need to be enforced during data load operations.
This session will cover the support for primary and foreign key (referential integrity) constraints in Spark. You’ll learn about the constraint specification, metastore storage, constraint validation and maintenance. You’ll also see examples of query optimizations that utilize referential integrity constraints, such as Join and Distinct elimination and Star Schema detection.
Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes
This is a talk about gathering a golden test set of relevancy judgements, either using manual annotators or search log mining, to use in either an automated or manual relevancy tuning process. We also discuss the dangers of positive feedback loops when building closed-loop machine learning models for search and recommendation.
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
In the talk I describe two approaches for improve the recall and precision of an enterprise search engine using machine learning techniques. The main focus is improving relevancy with ML while using your existing search stack, be that Luce, Solr, Elastic Search, Endeca or something else.
Geek Sync | Why Did My Clever Index Change Backfire?IDERA Software
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: http://ow.ly/LBvZ50A5a8P
You've got a great indexing strategy to speed up your SQL Server! You deploy the change and then... it all goes wrong.
Join IDERA and Kendra Little as she explains why index changes may slow down queries, cause queries to fail, cause unexpected outages, or just not work like they did when you tested them. You’ll leave the session with a set of practical steps to ensure you sidestep common indexing blunders and have confidence that your clever index change will speed up your queries instead of slowing them down. This is a demo-packed session you will not want to miss.
About Kendra: Kendra Little teaches database administrators and developers how to consistently speed up their SQL Servers. She is a Microsoft Certified Master in SQL Server, a Microsoft MVP, and one of the few artists in the world to draw both dinosaurs and databases.
Machine Learning 2 deep Learning: An IntroSi Krishan
Provides a brief introduction to machine learning, reasons for its popularity, a simple walk through example and then a need for deep learning and some of its characteristics. This is an updated version of an earlier presentation.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
2. Contents
• Physical Database Design
• Database Workloads
• Physical Design and tuning Decisions
• Need for Tuning
• Guidelines for Index Selection
• Clustering & indexing tools for index selection
• Database Tuning: Tuning index
• Tuning Conceptual schema
• Tuning queries and views
• Impact of Concurrency
• Benchmarking
3. Physical Database Design
• Process of producing a description of the implementation
of the database on secondary storage.
• It describes the base relations, file organizations, and
indexes used to achieve efficient access to the data, and
any associated integrity constraints and security
measures.
4. Physical Database Design
• We will describe the plan for how to build the tables,
including appropriate data types, field sizes, attribute
domains, and indexes.
• The plan should have enough detail that if someone else
were to use the plan to build a database, the database they
build is the same as the one you are intending to create.
• The conceptual design and logical design were
independent of physical considerations. Now, we not only
know that we want a relational model, we have selected a
database management system (DBMS) such as Access or
Oracle, and we focus on those physical considerations.
5. Logical vs. Physical Design:
• Logical database design is concerned with what to
store;
• physical database design is concerned with how to
store it.
6. Introduction
• We will be talking at length about “database
design”
– Conceptual Schema: info to capture, tables, columns,
views, etc.
– Physical Schema: indexes, clustering, etc.
• Physical design linked tightly to query optimization
– So we’ll study this “bottom up”
– But note: DB design is usually “top-down”
• conceptual then physical. Then iterate.
• We must begin by understanding the workload:
– The most important queries and how often they arise.
– The most important updates and how often they arise.
– The desired performance for these queries and updates.
7. Understanding the Workload
• For each query in the workload:
– Which relations does it access?
– Which attributes are retrieved?
– Which attributes are involved in selection/join conditions?
How selective are these conditions likely to be?
• For each update in the workload:
– Which attributes are involved in selection/join conditions?
How selective are these conditions likely to be?
– The type of update (INSERT/DELETE/UPDATE), and the
attributes that are affected.
– For the Update command, the fields that are modified by
update
8. Creating an ISUD Chart
Employee Table
Transaction Frequency% table Name Salary Address
Payroll Run monthly 100 S S S
Add Emps daily 0.1 I I I
Delete Emps daily 0.1 D D D
Give Raises monthly 10 S U NA
Insert, Select, Update, Delete Frequencies
9. Physical Design and tuning
Decisions
• Choice of indexes to create
– Which relations to index and which field
– What field(s) should be the search key
– Should we build several indexes
– For each index, should it be Clustered or un clustered
• Tuning the conceptual schema
– Alternative normalization,
– De normalization,
– Vertical partitioning,
– Views
• Query and transaction tuning
– Frequently executed queries are rewritten to run faster.
10. Need for Database Tuning
• Hard to get detailed workload at initial design.
• Concept of design and tuning are arbitrary.
• Design process is over after conceptual schema and
set of clustering and indexing decisions are made.
• Tuning process is subsequent changes to the
conceptual schema or the indexes.
11. Index Selection
• One approach:
– Consider most important queries.
– Consider best plan using the current indexes, and see if
better plan is possible with an additional index.
– If so, create it.
• Before creating an index, must also consider the
impact on updates in the workload.
– Trade-off of slowing some updates in order to speed up
some queries.
12. Whether to index (Guideline 1)
• Do not build an index unless some query including the
query components of updates benefit from it.
• Whenever possible choose indexes that speed up more than
one query
13. Multi attribute Search keys (Guideline 3)
– Two situations should be considered
1. a WHERE clause includes conditions on
more than one attribute of a relation
2. They enable index only evaluation
strategies (i.e. accessing relation can be
avoided) for important queries.
14. Whether to cluster (Guideline 4)
• As a rule of thumb, range queries are likely
to benefit the most from clustering
• If an index enables an index-only evaluation
strategy for the query it is intended to speed
up, the index need not be clustered
15. Hash verses Tree index (Guideline 5)
– Hash index is better in following situations
• The index is intended to support index
nested loops join; the indexed relation is the
inner relation, and the search key includes
the join columns
• There is very important equality query, and
no range queries, involving the search key
attributes
16. Balancing the cost of Index
Maintenance (Guideline 6)
• If maintaining an index slows down frequent
update operations, consider dropping the
index
• Keep in mind however , that adding an index
may well speed up a given update operation.
– E.g. an index on employee IDs could speed up the
operation of increasing the salary of a employee
(specified by ID)
17. Example 1
• Hash index on D.dname supports ‘Toy’ selection.
– Given this, index on D.dno is not needed.
• Hash index on E.dno allows us to get matching
(inner) Emp tuples for each selected (outer) Dept
tuple.
• What if WHERE included: `` ... AND E.age=25’’ ?
– Could retrieve Emp tuples using index on E.age, then join
with Dept tuples satisfying dname selection. Comparable to
strategy that used E.dno index.
– So, if E.age index is already created, this query provides
much less motivation for adding an E.dno index.
SELECT E.ename, D.mgr
FROM Emp E, Dept D
WHERE D.dname=‘Toy’ AND E.dno=D.dno
18. Example 2
• All selections are on Emp so it should be the outer
relation in any Index NL join.
– Suggests that we build a B+ tree index on D.dno.
• What index should we build on Emp?
– B+ tree on E.sal could be used, OR an index on E.hobby
could be used. Only one of these is needed, and which is
better depends upon the selectivity of the conditions.
• As a rule of thumb, equality selections more selective than range
selections.
• As both examples indicate, our choice of indexes is
guided by the plan(s) that we expect an optimizer to
consider for a query. Have to understand optimizers!
SELECT E.ename, D.mgr
FROM Emp E, Dept D
WHERE E.sal BETWEEN 10000 AND 20000
AND E.hobby=‘Stamps’ AND E.dno=D.dno
19. Clustering and Indexing
• Clustered indexes can be especially
important while accessing the inner relation
in an index nested loops joins
• Revisit the same e.g
• Should the used indexes be clustered?
• Unclustered index on dname
• On the other hand Emp is the inner relation
in an index NL join and dno is not candidate
key
• Dno should be clustered index
SELECT E.ename, D.mgr
FROM Emp E, Dept D
WHERE D.dname=‘Toy’ AND E.dno=D.dno
20. Examples of Clustering
• B+ tree index on E.age can be
used to get qualifying tuples.
– How selective is the condition?
– Is the index clustered?
• Consider the GROUP BY query.
– If many tuples have E.age > 10,
using E.age index and sorting the
retrieved tuples may be costly.
– Clustered E.dno index may be
better!
• Equality queries and duplicates:
– Clustering on E.hobby helps!
SELECT E.dno
FROM Emp E
WHERE E.age>40
SELECT E.dno, COUNT (*)
FROM Emp E
WHERE E.age>10
GROUP BY E.dno
SELECT E.dno
FROM Emp E
WHERE E.hobby=Stamps
22. Co-clustering Two Relations
• It can speed up joins, in particular key
foreign key joins corresponding to 1:N
relations
• A sequential scan of either relation becomes
slower.
• All inserts, deletes and updates that alter
record lengths become slower, thanks to
overhead involved in maintaining the
clustering
23. Index-Only Plans
• A number of
queries can be
answered
without
retrieving any
tuples from one
or more of the
relations
involved if a
suitable index
is available.
SELECT D.mgr
FROM Dept D, Emp E
WHERE D.dno=E.dno
SELECT D.mgr, E.eid
FROM Dept D, Emp E
WHERE D.dno=E.dno
SELECT E.dno, COUNT(*)
FROM Emp E
GROUP BY E.dno
SELECT E.dno, MIN(E.sal)
FROM Emp E
GROUP BY E.dno
SELECT AVG(E.sal)
FROM Emp E
WHERE E.age=25 AND
E.sal BETWEEN 3000 AND 5000
<E.dno>
<E.dno,E.eid>
<E.dno>
<E.dno,E.sal>
B-tree trick!
<E. age,E.sal>
or
<E.sal, E.age>
24. Tools to Assist in Index Selection
• First generation of such tools:
– Index tuning wizards or
– Index advisors
• Drawback of these systems
– They had to replicate the database query
optimizers cost model
25. • The DB2 Index Advisor
– Tool for automatic index recommendation given a
workload
– Workload table: ADVISE_WORKLOAD
– It is populated either
• By SQL stmts from DB2 dynamic SQL stmt cache for
recently executed SQL stmts
• With SQL stmts from packages statically compiled stmts
OR
• With SQL stmts from online monitor called Query
Patroller
• Output: SQL DDL statements whose execution creates
recommended indexes
Tools to Assist in Index Selection
26. • The Microsoft SQL server 2000 Index Tuning
wizard
– Tuning wizard integrated with the database query
optimizer
– 3 tuning modes that permits user to trade off running
time of analysis and no. of candidate index
configurations examined: fast, medium and thorough
with fast having lowest running time and thorough
examining the largest no. of configurations
– Max space allowed for indexes, Allows table scaling
– Reduces running time by sampling mode
– Table scaling
Tools to Assist in Index Selection
27. Overview of Database Tuning
• Actual use of DB provides a valuable source
of detailed information that can be used to
refine the initial design
• Original assumptions are replaced
• Initial workload is validated
• Initial guesses about size of data can be
replaced with actual statistics
• Tuning imp to get best possible performance
• 3 kinds of tuning: - tuning indexes, tuning
the conceptual schema, and tuning queries
28. Tuning indexes
• Queries and updates considered important at
initial level are not very frequent
• Observed workload may also identify some
new queries and updates
• Initial choice of indexes has to be reviewed
in light of this new information
• Some original indexes may be dropped and
new ones added
29. • It uses index only scan with Emp as inner
relation
• If this query takes an unexpectedly long
time to execute replace previous plan with
dno field and clustered index
Tuning indexes continues…
SELECT E.ename, D.mgr
FROM Emp E, Dept D
WHERE D.dname=‘Toy’ AND E.dno=D.dno
30. • In addition we have to Periodically reorganize
indexes
– E.g Static index (ISAM index) may have developed long
overflow chains, drop or rebuilt- if feasible, improves
access time through this index
– Dynamic structure (B+ tree) - if the implementation does
not merge pages on deletes, space occupancy can
decrease considerably in some situations. This in turn
makes the size of the index (in pages) larger than
necessary, and could increase the height and therefore
the access time
Tuning indexes continues…
31. Tuning conceptual Schema
• If initial schema doesn’t meet our performance
objectives for the given workload with any set
of physical design if so redesign conceptual
schema
• Such change is called as schema evolution
• Issues involved in tuning conceptual schema:
– Decide to settle for a 3NF design instead of BCNF
– Among 3NF or BCNF our choice should be guided by
workload
– Sometime we might decide to further decompose
relation that is already in BCNF
– We might denormalize
– partitioning
32. Tuning Queries and Views
• If a query runs slower than expected, check if an index needs
to be re-built, or if statistics are too old and rebuilt the queries.
• Sometimes, the DBMS may not be executing the plan you had
in mind. Common areas of optimizer weakness:
– Selections involving null values (bad selectivity estimates)
– Selections involving arithmetic or string expressions (ditto)
– Selections involving OR conditions (ditto)
– Complex, correlated subqueries
– Lack of evaluation features like index-only strategies or certain join
methods or poor size estimation.
• Check the plan that is being used! Then adjust the choice of
indexes or rewrite the query/view.
– E.g. check via POSTGRES “Explain” command
– Some systems rewrite for you under the covers (e.g. DB2)
• Can be confusing and/or helpful!
33. More Guidelines for Query Tuning
• Minimize the use of DISTINCT: don’t need it if
duplicates are acceptable, or if answer contains a
key.
• Minimize the use of GROUP BY and HAVING:
SELECT MIN (E.age)
FROM Employee E
GROUP BY E.dno
HAVING E.dno=102
SELECT MIN (E.age)
FROM Employee E
WHERE E.dno=102
Consider DBMS use of index when writing arithmetic
expressions: E.age=2*D.age will benefit from index on
E.age, but might not benefit from index on D.age!
34. Guidelines for Query Tuning (Contd.)
• Avoid using intermediate
relations:
SELECT * INTO Temp
FROM Emp E, Dept D
WHERE E.dno=D.dno
AND D.mgrname=‘Joe’
SELECT T.dno, AVG(T.sal)
FROM Temp T
GROUP BY T.dno
vs.
SELECT E.dno, AVG(E.sal)
FROM Emp E, Dept D
WHERE E.dno=D.dno
AND D.mgrname=‘Joe’
GROUP BY E.dno
and
Does not materialize the intermediate reln Temp.
36. Choices in Tuning The Conceptual
Schema contd…
• the relation Contracts, denoted as CSJDPQV
– The meaning of a tuple in this relation is that the contract
with cid C is an agreement that supplier S (with sid equal to
supplierid) will supply Q items of part P (with pid equal to
partid) to project J (with jid equal to projectid) associated
with department D (with deptid equal to did), and that the
value V of this contract is equal to value
37. Choices in Tuning The Conceptual
Schema contd…
• There are two known integrity constraints with
respect to Contracts
• 1. A project purchases a given part using a
single contract
• JP C
• 2. a department purchases at most one part
from any given supplier
• SD P
38. Settling for a Weaker Normal Form
• Consider contract relation
• We will see what normal form it is in
• candidate keys for this relation are C and JP
• only nonkey dependency is SD P, and P is
a prime attribute because it is part of
candidate key JP
• It is in 3NF
• We will decompose it and convert it into
BCNF
• we obtain a lossless-join and dependency-
preserving decomposition into BCNF by
decomposing schema we will get schemas
CJP, SDP, and CSJDQV
39. Horizontal Decompositions
• Usual Def. of decomposition: Relation is replaced by
collection of relations that are projections. Most
important case.
– We will talk about this at length as part of Conceptual DB
Design
• Sometimes, might want to replace relation by a
collection of relations that are selections.
– Each new relation has same schema as original, but subset
of rows.
– Collectively, new relations contain all rows of the original.
– Typically, the new relations are disjoint.
40. Horizontal Decompositions (Contd.)
• Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val)
• Suppose that contracts with value > 10000 are
subject to different rules.
– So queries on Contracts will often say WHERE val>10000.
• One approach: clustered B+ tree index on the val
field.
• Second approach: replace contracts by two new
relations, LargeContracts and SmallContracts, with
the same attributes (CSJDPQV).
– Performs like index on such queries, but no index overhead.
– Can build clustered indexes on other attributes, in addition!
41. Masking Conceptual Schema Changes
• Horizonal Decomposition from above
• Masked by a view.
– NOTE: queries with condition val>10000 must be asked wrt
LargeContracts for efficiency: so some users may have to
be aware of change.
• I.e. the users who were having performance problems
• Arguably that’s OK -- they wanted a solution!
CREATE VIEW Contracts(cid, sid, jid, did, pid, qty, val)
AS SELECT *
FROM LargeContracts
UNION
SELECT *
FROM SmallContracts
42. Impact of Concurrency
• In a system with many concurrent users,
several additional points must be considered
• Transaction obtains locks on the pages that
it reads or writes and others may be blocked
• 2 specific ways to reduce blocking
– Reduce the time that transactions hold locks
– Reducing hot spots
43. Reducing Lock Durations
• Delay lock requests
– Tune transaction by writing to local prog. variables
and deferring changes to database until the end of
transaction
• Make transaction Faster
– Tuning indexing and rewriting queries
– Careful partitioning of the tuples in relation and
associated indexes across a collection of discs
• Replace long transactions by short ones
– Rewriting into two or more smaller transactions
44. Reducing Lock Durations contd…
• Build a warehouse
– Complex queries can hold shared lock for longer
time, involve statistical analysis of business trends
– Can run on copy of data that is little out of date
• Consider a lower Isolation Level
– In many situations such as queries generating
aggregate info or statistical summaries
– Use lower SQL isolation level as REPEATABLE
READ or READ COMMITTED
45. Reducing Hot Spots
• Delay operations on Hot Spots
– Requests using frequently used objects
• Optimize Access Patterns
– Pattern of updates
• Partitioning operations on Hot Spots
– Batch append
• Choice of Index
– In Frequent updating relation, B+ tree indexes can
become bottleneck so root and index pages
becomes hot spots
– Specialized locking protocols help (fine granularity
locks)
– Leads to ISAM index (only leafs gets locks)
46. DBMS Benchmarking
• Includes benchmarks for measuring the
performance of a certain class of applications
(e.g., the TPC benchmarks) and
• benchmarks for measuring how well a DBMS
performs various operations (e.g., the Wisconsin
benchmark)
– Benchmarks should be portable, easy to understand,
and scale naturally to larger problem instances. They
should measure peak performance (e.g., transactions
per second, or tps) as well as price/performance ratios
(e.g., $/tps) for typical workloads in a given application
domain
47. • The Transaction Processing Council (TPC)
was created to define benchmarks for
transaction processing and database
systems
• Well-Known DBMS Benchmarks
– The TPC-A and TPC-B benchmarks constitute the
standard definitions of the tps and $/tps measures
– TPC-A measures the performance and price of a
computer network in addition to the DBMS,
– whereas the TPC-B benchmark considers the
DBMS by itself
DBMS Benchmarking
48. DBMS Benchmarking
– The TPC-C benchmark is a more complex suite of
transactional tasks than TPC-A and TPC-B
– It models a warehouse that tracks items supplied to
customers and involves five types of transactions
– Much more expensive than TPC-A and TPC-B
– exercises a much wider range of system capabilities
– TPC-D TPC-D represents a broad range of decision
support (DS) applications that require complex, long
running queries against large complex data structures.
49. DBMS Benchmarking
• The TPC Benchmark™H (TPC-H) is a decision support
benchmark.
• It consists of a suite of business oriented ad-hoc queries and
concurrent data modifications.
• The queries and the data populating the database have been
chosen to have broad industry-wide relevance.
• This benchmark illustrates decision support systems that
examine large volumes of data, execute queries with a high
degree of complexity, and give answers to critical business
questions.
50. Points to Remember
• Indexes must be chosen to speed up important
queries (and perhaps some updates!).
– Index maintenance overhead on updates to key fields.
– Choose indexes that can help many queries, if possible.
– Build indexes to support index-only strategies.
– Clustering is an important decision; only one index on a
given relation can be clustered!
– Order of fields in composite index key can be important.
• Static indexes may have to be periodically re-built.
• Statistics have to be periodically updated.
51. Points to remember (Contd.)
• Over time, indexes have to be fine-tuned (dropped,
created, re-clustered, ...) for performance.
– Should determine the plan used by the system, and adjust
the choice of indexes appropriately.
• System may still not find a good plan:
– Only left-deep plans?
– Null values, arithmetic conditions, string expressions, the
use of ORs, nested queries, etc. can confuse an optimizer.
• So, may have to rewrite the query/view:
– Avoid nested queries, temporary relations, complex
conditions, and operations like DISTINCT and GROUP BY.