ADSI Notes - Diogo Lopes

Search

Search IconIcon to open search

Lecture 9 - Tuning

Last updated Mar 24, 2023

Slides

# Tuning

# “All about performance”

# What

# Why

Causes of bad performance may have to do with disk size, log system, locks…

# Tuning Principles

  1. Think globally, fix locally
    • we have to know everything about the DB to investigate what is going on
    • usually the solution is a simple change
  2. . Partitioning breaks bottlenecks
    • temporal and spatial
    • many transaction may compete for the same resources
  3. Start-up costs are high; running costs are low
    • bringing data to memory is costly (finding execution plans)
  4. Render unto server what is due unto server
    • take the most advantage of the DB system (joins, logic operations…)
  5. Be prepared for trade-offs
    • indexes are trade-offs (inserting data means updating indexes)

# Techniques (in this lecture)

# Schema Tuning

consists of changinf the tables of the database to get better perfiormance

# Normalization and Denormalization

A relation R is normalized if every interesting functional dependency X -> A has the property that X is a key of R

Normalization is not always better

Denormalizing means sacrificing normalization for the sake of performance:

See Slides for Benchmark

# Queries

The query: “find all line items whose supplier is in Europe”

In a Normalized schema 600 000 line items, 500 suppliers, 25 nations, 5 regions In a Denormalized one 600 000 line items

# Partitioning

# Vertical Partitioning (columns)

Three attributes: account_ID , balance , address

Q: Which design is better? R: It depends on the query pattern .

The second schema might be better because the relation ( account_ID , balance ) can be made smaller

A single normalized relation XYZ is better than two normalized relations XY and XZ for queries accessing X, Y, Z together

The two relation design is better if:

Benchmarking:

# Vertical Partitioning vs Vertical Antipartitioning

Breaking the rules in the name of performance

# Horizontal Partitioning (rows)

The accounting department of a convenience store chain issues queries every 20 minutes to obtain:

Solution: Aggregation Maintenance

Add the following materialized views:

Benchmark:

# Query Tuning

# Index Usage

# Eliminate Unneeded DISTINCTs

Ways to eliminate DICTINCT

PRIMARY KEYs do not repeat

In general, DISTINCT is required when:

# Reaching

There will be no duplicates among the records returned by a selection, if one of the two following conditions hold:

See Slides for examples (56~58)

# Types of Nested Queries

When you use a row from the FROM, you have a nested query See Slides for examples (59~60)

# Rewriting Subqueries

# Rewriting of Uncorrelated Subqueries

uncorrelated nested queries -> flat query

  1. Retain the SELECT clause from the outer block
  2. Combine the arguments of the two FROM clauses
  3. AND together all the WHERE clauses, replacing IN by =
1
2
3
4
SELECT ssnum
FROM Employee
WHERE dept IN (SELECT dept
FROM Techdept

becomes

1
2
3
SELECT ssnum
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept

# Rewriting of Correlated Subqueries

correlated nested queries -> temporary table Query: find the employees who earn more than the average salary in their tech department

1
2
3
4
5
6
SELECT ssnum
FROM Employee e1
WHERE salary > (SELECT avg(e2.salary)
FROM Employee e2, Techdept
WHERE e2.dept = Techdept.dept
AND e2.dept = e1.dept);

This could be inefficient; same average salary computed multiple times

Solution

1
2
3
4
5
INSERT INTO Temp
SELECT avg(salary) as avsalary , Employee.dept
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept
GROUP BY Employee.dept

Returns the average of salaries per tech department

1
2
3
4
SELECT ssnum
FROM Employee, Temp
WHERE salary > avsalary
AND Employee.dept = Temp.dept

A better solution would be to use a materialized view (automatically created when creating indexes in SQLServer)

# (Ab)use of Temporaries

Query: Find all employees in the information systems department who earn more than $40000

1
2
3
4
5
6
7
8
INSERT INTO Temp
SELECT *
FROM Employee
WHERE salary > 40000;

SELECT ssnum
FROM Temp
WHERE Temp.dept = 'Information

Optimizer would miss the opportunity to use the index on dept

More efficient solution:

1
2
3
4
SELECT ssnum
FROM Employee
WHERE dept = 'Information Systems'
AND salary > 40000;

# Join Conditions

Example: Find all students who are also employees

1
2
3
SELECT *
FROM Employee, Student
WHERE Employee.name = Student.name;

Both tables have index on name , but it is a non clustered index;

The following join would be much more efficient:

1
2
3
SELECT *
FROM Employee, Student
WHERE Employee.ssnum = Student.ssnum

Here we can have a MERGE as both tables are sorted on the clustered index on the PK

# Use of HAVING

Do not use HAVING when WHERE is enough

1
2
3
4
5
SELECT avg(salary) as avgsalary , dept
FROM Employee
GROUP BY dept
HAVING dept = 'Information Systems';
SELECT avg(

Here we are creating a GROUP for every dept (bad performance)

1
2
3
4
SELECT avg(salary) as avgsalary , dept
FROM Employee
WHERE dept = 'Information Systems'
GROUP BY dept;

only 1 group!

# Use of VIEWS

Views may cause queries to execute inefficiently

1
2
3
4
CREATE VIEW Techlocation AS
SELECT ssnum , Techdept.dept , location
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept
1
2
3
SELECT dept
FROM Techlocation
WHERE ssnum = 43253265;

The query below will be slower because of the expansion of the view (also price of JOIN) The system might not use the view

# Performance Impact of Query Rewritings

Lecture 8 Database Recovery | Lecture 10 Tuning (continued)