ADSI Notes - Diogo Lopes

Search

Search IconIcon to open search

Lecture 3 - Indexing

Last updated Mar 17, 2023

Slides

# Indexes

# “Indexes will be our friends”

Main Indexes:

Example: The book has the table of contents - shoes topics in order (page numbe, you are advancing in the book sequencially); and the index at the end of the book (the page numbers is not order, in this case the index entries are sorted alfphabetically)

# Ordered Indexes

# Dense Index

Even without a pointer to some record, we can assume some “categories” are in between some other indexed categories

# Sparse Index

# Dense vs Sparse

Sparse compared to Dense Indexes:

# Multilevel Index

  1. Outer Indexes Sparse and inner Index Dense (for example).
  2. If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.
  3. Allows us to jump directly to some part of the table
  4. Indexes at all levels must be updated on insertion or deletion from the file

# Clustered Index

All records regarding to a column are sequencial, so its faster to retrieve them

If you have a pointer pointing to the first record in a table we only need that one, because the other ones follow it.

# Non-Clustered Index

The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.

More info on clustered and non-clustered Indexes

# B+ -Tree

# Searching for “Katz”

  1. Decide if Katz comes before/after Mozart -> follows pointer to the left (K<M)
  2. Theres no Katz here (sparse index) -> follow pointer to the right (K>E && K<G)
  3. This is a dense index: If value is there -> follow pointer

# Properties

# B+ -Tree Insertion

Inserting Adams:

Adams < Mozart Adams < Einstein Adams < Brandt

We need to put it in the front But Adams doesnt fit in the node

# Splitting Procedure

Inserting Lamport:

It should be in-between Kim and Mozart

Add more nodes -> need more pointers Problem: Previous level has no space Solution: Split nodes recursively

Add another level -> Add another leaf -> Choose adequate value in level for pointer -> Correct Root

Left has to have values < Right has to have values >=

The root value has to be the smallest value in the leaf section

# Deleting Entries

Merging and Redistribution Removing a Leaf, Srinivasan:

Removing 2 non-Leaves Singh and Wu:

Removing a root, Gold:

  1. Check where Gold is (at Root -> Less than Kim -> found leaf)
  2. No need for this many nodes -> Merge Nodes
  3. Causes the existing os a single Pointer -> Go get a new pointer (Left)
  4. No need for root -> Delete has it only has 1 child
  5. Assess pointer values

More info on B+-Tree

# Indexes Notes:

# Hash File organization

# Hash Indexes

Worst hash function maps all seach-keys to the same bucket An ideal hash function is uniform and random

The same principle of B+-Tree index Node size applies to containers: A Bucket is the same size of a disk block.

As we grow the number of entries and therefore the number of bucket, we might need to change the hash function

# Static Hashing Deficiencies

In static hashing, function ℎ maps search-key values to a fixed set of 𝐵 of bucket addresses. Databases grow or shrink with time.

One solution: periodic re-organization of the file with a new hash function

Better solution:

# Extendable Hash Structure

Context: We have a single bucket -> We have a hash function that with any value indicates that bucket

1-bit prefix

Suppose we consider the first bit from the value; Not all values fit into the bucket: 3 entries->2 values with 0’s are inserted but another one is not

Need more buckets… need more bits

2-bit prefix

00, 01, 10, 11

01 hash value is not being used -> using 1-bit prefix in the first bucket

3-bit prefix

last bucket: first 2 bits are 1 so this bucket is still 2 bit we need 3 bits for some buckets (physics and finance)

Now inserting another record and we dont have space, we just start to consider another bit in the bucket

Once have buckets filled with the same value, overflow buckets are bound to happen, we dont need to consider anymore bits to those buckets

Collisions happens usually due to same value collision

Theres always 2 solution, consider more bits in a bucket or double the bucket address table (2x)

Using another bit in the prefix does not mean we use more buckets Pointers grow exponentially, buckets not

# General Extendable Hash Structure Use

# Hash Notes

# Bitmap Index

Bitmaps for gender: m/f = 1 for each reacord same for income_level

Use: “find all records for gender “f” and income_level “L3"” Do an AND operation and we get the last record

same for other operations

Lecture 2 Storage and file organization | Lecture 4 Query Processing