Sunteți pe pagina 1din 118

Database Storage and Indices

Pooria Azimi

Project 1
Questions?

Abstraction Layers
DBMS Architecture Levels:

External (View) Level Logical (Conceptual) Level Physical (Internal) Level

Outline

Outline
How to store a table on disk?

Outline
How to store a table on disk?

Sequential le, Hash map, B tree, ...

Outline
How to store a table on disk?

Sequential le, Hash map, B tree, ...


Indexing, for faster retrieval

Outline
How to store a table on disk?

Sequential le, Hash map, B tree, ...


Indexing, for faster retrieval

Single and multiple index, Bitmap index,


Clustered vs. non-clustered, ...

Outline
How to store a table on disk?

Sequential le, Hash map, B tree, ...


Indexing, for faster retrieval

Single and multiple index, Bitmap index,


Clustered vs. non-clustered, ...

Tradeoffs in using (multiple) indices


4

Users table:
UID
1 2 3 4 5 6 7
438532

Name

Email
alice@aut.ac.ir bob@gmail.com dave@yahoo.com

Gender Age Group

City
London Paris Shanghai Sydney Toronto Shanghai Cairo

Salary

Alice Bob Carol Dave Eve Peggy Steve

Female 30-40 Male Male 20-30 10-20

1260 907 1400 2577 3790 1968 3731

carol@gmail.com Female 20-30 eve@hotmail.com Female 50-60 peggy@aut.ac.ir Female 40-50 steve@me.com

Mallory mallory@me.com Female 30-40

Male

50-60 Cupertino 7110

Sequential Files

Sequential Files
CSV (comma separated values) Fixed-length elds Variable-length elds

Sequential Files
CSV (comma separated values) Fixed-length elds Variable-length elds XML
6

Sequential Files
CSV (comma separated values) Fixed-length elds Variable-length elds XML ...
6

CSV (Variable-length Fields)


/home/pooria/db/users.table
1,Alice,alice@aut.ac.ir,Female,30-40,London,1260\n 2,Bob,bob@gmail.com,Male,20-30,Paris,907\n 3,Carol,carol@gmail.com,Female,20-30,Shanghai,1400\n 4,Dave,dave@yahoo.com,Male,10-20,Sydney,2577\n 5,Eve,eve@hotmail.com,Female,50-60,Toronto,3790\n 6,Mallory,mallory@me.com,Female,30-40,Shanghai,1968\n 7,Peggy,peggy@aut.ac.ir,Female,40-50,Cairo,3731\n ...... 438532,Steve,steve@me.com,Male,50-60,Cupertino,7110\n

CSV (Variable-length Fields)


Search: Slow Update: Slow Delete: Slow Space:Very efcient
8

CSV (Variable-length Fields)


Suitable for

Small tables and short Logs Tables that dont change very often Tables that must be read sequentially
(moving data between two application)

Tables with variable-length elds


9

CSV (Fixed-length Fields)


/home/pooria/db/users.table
1, 2, 3, 4, 5, 7, ...... 438532, Steve, steve@me.com, Male,50-60,Cupertino,7110\n Alice,alice@aut.ac.ir,Female,30-40, Bob, bob@gmail.com, Male,20-30, Male,10-20, London,1260\n Paris, 907\n Sydney,2577\n Toronto,3790\n Cairo,3731\n

Carol,carol@gmail.com,Female,20-30, Shanghai,1400\n Dave, dave@yahoo.com, Eve,eve@hotmail.com,Female,50-60, Peggy,peggy@aut.ac.ir,Female,40-50,

6,Mallory, mallory@me.com,Female,30-40, Shanghai,1968\n

10

CSV (Fixed-length Fields)


/home/pooria/db/users.table
1 2 3 4 5 7 ...... 438532 Steve steve@me.com Male50-60Cupertino7110 Alicealice@aut.ac.irFemale30-40 Bob bob@gmail.com Male20-30 Male10-20 London1260 Paris 907 Sydney2577 Toronto3790 Cairo3731

Carolcarol@gmail.comFemale20-30 Shanghai1400 Dave dave@yahoo.com Eveeve@hotmail.comFemale50-60 Peggypeggy@aut.ac.irFemale40-50

6Mallory mallory@me.comFemale30-40 Shanghai1968

11

CSV (Fixed-length Fields)


Search: A bit faster Update:Very fast Delete: Slow Space: Not efcient
(than variable-length)

(however, we must search the tuple rst)

12

CSV (Fixed-length Fields)


Suitable for

Logs Tables that can be read sequentially Tables with short, xed-length elds

13

Hash Map

14

Hash Map
2 | Bob | 26 | bob@gmail.com 14 | Jane | 20 | clark@gmail.com 5 | Smith | 14 | smith@me.com
Hash function
13 | Adams | 35 | adams@yahoo.com

UID

6 | Jones | 19 | jones@gmail.com 7 | Jane | 48 | jane@hotmail.com 1 | Blake | 23 | blake@gmail.com


14

Hash Map
2 | Bob | 26 | bob@gmail.com 14 | Jane | 20 | clark@gmail.com 5 | Smith | 14 | smith@me.com
Hash function
13 | Adams | 35 | adams@yahoo.com

UID
UID=7

6 | Jones | 19 | jones@gmail.com 7 | Jane | 48 | jane@hotmail.com 1 | Blake | 23 | blake@gmail.com


14

Hash Map
2 | Bob | 26 | bob@gmail.com 14 | Jane | 20 | clark@gmail.com 5 | Smith | 14 | smith@me.com
Hash function
13 | Adams | 35 | adams@yahoo.com

UID
UID=7

6 | Jones | 19 | jones@gmail.com 7 | Jane | 48 | jane@hotmail.com 1 | Blake | 23 | blake@gmail.com


14

Hash Map
2 | Bob | 26 | bob@gmail.com 14 | Jane | 20 | clark@gmail.com 5 | Smith | 14 | smith@me.com
Hash function
13 | Adams | 35 | adams@yahoo.com

UID
City=Cupertino UID=7

6 | Jones | 19 | jones@gmail.com 7 | Jane | 48 | jane@hotmail.com 1 | Blake | 23 | blake@gmail.com


14

Hash Map
2 | Bob | 26 | bob@gmail.com 14 | Jane | 20 | clark@gmail.com 5 | Smith | 14 | smith@me.com
Hash function
13 | Adams | 35 | adams@yahoo.com

UID
UID=7

6 | Jones | 19 | jones@gmail.com 7 | Jane | 48 | jane@hotmail.com 1 | Blake | 23 | blake@gmail.com City=Cupertino


14

Hash Map

15

Hash Map
Good:

15

Hash Map
Good: Fast (for lookup eld)

15

Hash Map
Good: Fast (for lookup eld) Easy to implement

15

Hash Map
Good: Fast (for lookup eld) Easy to implement Relatively space-efcient (no pointers)

15

Hash Map
Good: Fast (for lookup eld) Easy to implement Relatively space-efcient (no pointers)

15

Hash Map
Good: Fast (for lookup eld) Easy to implement Relatively space-efcient (no pointers)

Bad:

15

Hash Map
Good: Fast (for lookup eld) Easy to implement Relatively space-efcient (no pointers)

Bad: Nave implementations only allow one index (lookup key) per table

15

Hash Map
Good: Fast (for lookup eld) Easy to implement Relatively space-efcient (no pointers)

Bad: Nave implementations only allow one index (lookup key) per table Not scalable (for non-memory resident data)

15

Hash Map
Good: Fast (for lookup eld) Easy to implement Relatively space-efcient (no pointers)

Bad: Nave implementations only allow one index (lookup key) per table Not scalable (for non-memory resident data) Not suitable for range queries
15

B tree
... ...
9 | Joh-

12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...

16 | Jan- 17 | Ste-

...

...

...

...

16

B tree

17

B tree

They were specically built for database storage

17

B tree

They were specically built for database storage They are I/O-friendly

17

B tree

They were specically built for database storage They are I/O-friendly B trees are fast and scalable

17

B tree

They were specically built for database storage They are I/O-friendly B trees are fast and scalable They can be modied to allow more than one index per table

17

B tree

They were specically built for database storage They are I/O-friendly B trees are fast and scalable They can be modied to allow more than one index per table They work well with range queries

17

B tree
Demo

18

B tree

19

B tree
Good:

19

B tree
Good: Fast (for indexed elds)

19

B tree
Good: Fast (for indexed elds) Scalable

19

B tree
Good: Fast (for indexed elds) Scalable Suitable for range queries (on indexed elds)

19

B tree
Good: Fast (for indexed elds) Scalable Suitable for range queries (on indexed elds)

Bad:

19

B tree
Good: Fast (for indexed elds) Scalable Suitable for range queries (on indexed elds)

Bad:

Space overhead

19

B tree
Good: Fast (for indexed elds) Scalable Suitable for range queries (on indexed elds)

Bad:

Space overhead Negative impact on performance

19

Index
Why use indices? Faster retrieval Most efcient in range queries Tradeoffs Slower writes/updates vs. faster retrieval Disk space overhead
20

Users table:
UID
1 2 3 4 5 6 7
438532

Name

Email
alice@aut.ac.ir bob@gmail.com dave@yahoo.com

Gender Age Group

City
London Paris Shanghai Sydney Toronto Shanghai Cairo

Salary

Alice Bob Carol Dave Eve Peggy Steve

Female 30-40 Male Male 20-30 10-20

1260 907 1400 2577 3790 1968 3731

carol@gmail.com Female 20-30 eve@hotmail.com Female 50-60 peggy@aut.ac.ir Female 40-50 steve@me.com

Mallory mallory@me.com Female 30-40

Male

50-60 Cupertino 7110

21

Why Use Indices?


Without using indices, most queries require a full scan!

22

Why Use Indices?


Sample queries:

23

Why Use Indices?


Sample queries:

SELECT name, city FROM users WHERE city LIKE S%

23

Why Use Indices?


Sample queries:

SELECT name, city FROM users WHERE city LIKE S% SELECT name, city FROM users WHERE uid BETWEEN 10 AND 20

23

24

Index (in memory)


UID

10 11 12 13 14 15

24

Index (in memory)


UID

10 11 12 13 14 15
8 Bytes (4 bytes for keys, 4 bytes for pointers)
24

Index (in memory)


UID UID

Data (on disk) 7 13 11 9 12 10

10 11 12 13 14 15
8 Bytes (4 bytes for keys, 4 bytes for pointers)

24

Index (in memory)


UID UID

Data (on disk) 7 13 11 9 12 10


480 Bytes

10 11 12 13 14 15
8 Bytes (4 bytes for keys, 4 bytes for pointers)

24

Index (in memory)


UID UID

Data (on disk) 7 13 11 9 12 10


480 Bytes

10 11 12 13 14 15
8 Bytes (4 bytes for keys, 4 bytes for pointers)

24

Index (in memory)


UID UID

Data (on disk) 7 13 11 9 12 10


480 Bytes

10 11 12 13 14 15
8 Bytes (4 bytes for keys, 4 bytes for pointers)

24

Index (in memory)


UID UID

Data (on disk) 7 13 11 9 12 10


480 Bytes

10 11 12 13 14 15
8 Bytes (4 bytes for keys, 4 bytes for pointers)

24

Index (in memory)


UID UID

Data (on disk) 7 13 11 9 12 10


480 Bytes

10 11 12 13 14 15
8 Bytes (4 bytes for keys, 4 bytes for pointers)

24

25

Index (in memory)


Name

Alice Bob Bernard Carole Cindy Dave

25

Index (in memory)


Name UID Name
Peggy Carole Bob Trudy Bernard Alice

Data (on disk) 8 3 2 26 32 1

Alice Bob Bernard Carole Cindy Dave

25

Index (in memory)


Name UID Name
Peggy Carole Bob Trudy Bernard Alice

Data (on disk) 8 3 2 26 32 1

Alice Bob Bernard Carole Cindy Dave

25

Index (in memory)


Name UID Name
Peggy Carole Bob Trudy Bernard Alice

Data (on disk) 8 3 2 26 32 1

Alice Bob Bernard Carole Cindy Dave

25

Index (in memory)


Name UID Name
Peggy Carole Bob Trudy Bernard Alice

Data (on disk) 8 3 2 26 32 1

Alice Bob Bernard Carole Cindy Dave

25

Index (in memory)


Name UID Name
Peggy Carole Bob Trudy Bernard Alice

Data (on disk) 8 3 2 26 32 1

Alice Bob Bernard Carole Cindy Dave

25

Separate trees for index & data

B tree

26

Separate trees for index & data

B tree

Data (UID):

12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
26

16 | Jan- 17 | Ste-

...

...

...

...

Separate trees for index & data Index (UID):


12
15

B tree
...

10

11

...

13

14

...

16

17

...

...

...

...

Data (UID):

12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
26

16 | Jan- 17 | Ste-

...

...

...

...

Separate trees for index & data Index (UID):


12
15

B tree
...

10

11

...

13

14

...

16

17

...

...

...

...

Memory Disk

Data (UID):

12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
26

16 | Jan- 17 | Ste-

...

...

...

...

Separate trees for index & data Index (UID):


12
15

B tree
...

10

11

...

13

14

...

16

17

...

...

...

...

Memory Disk

Data (UID):

12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
27

16 | Jan- 17 | Ste-

...

...

...

...

Multiple Indexes
UID:
12 15

B tree
...
13 14

10 11

...

...

16 17

... ... ... ...

Memory Disk
12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
28

16 | Jan- 17 | Ste-

...

...

...

...

Multiple Indexes
UID:
12 15

B tree
...
13 14

10 11

...

...

16 17

... ... ... ...

Memory Disk
12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
29

16 | Jan- 17 | Ste-

...

...

...

...

Multiple Indexes
UID:
12 15

B tree

Username:

...

Bo Jon

...

10 11

...

13 14

...

16 17

... ... ... ...

Ad Bl

...

Cl Jan

...

Sm Ste

... ... ... ...

Memory Disk
12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
30

16 | Jan- 17 | Ste-

...

...

...

...

Multiple Indexes
UID:
12 15

B tree

Username:

...

Bo Jon

...

10 11

...

13 14

...

16 17

... ... ... ...

Ad Bl

...

Cl Jan

...

Sm Ste

... ... ... ...

Memory Disk
12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
31

16 | Jan- 17 | Ste-

...

...

...

...

UID:
12 15

...

Multiple Indexes
... ... ... ...
Age:
20 35 Ad Bl

B tree
...

Username:
Bo Jon

...

10 11

...

13 14

...

16 17

...

Cl Jan

...

Sm Ste

... ... ... ...

14 19

...

23 26

...

48 56

... ... ... ...

Memory Disk

12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
32

16 | Jan- 17 | Ste-

...

...

...

...

UID:
12 15

...

Multiple Indexes
... ... ... ...
Age:
20 35 Ad Bl

B tree
...

Username:
Bo Jon

...

10 11

...

13 14

...

16 17

...

Cl Jan

...

Sm Ste

... ... ... ...

14 19

...

23 26

...

48 56

... ... ... ...

Memory Disk

12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
33

16 | Jan- 17 | Ste-

...

...

...

...

Multiple Indexes

B tree

Index trees are small. Therefore, we can t all index trees in main memory.

34

Clustered vs. Non-clustered Indexes


Non-clustered Index:
The data rows may be randomly spread throughout the table. A non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the page and the row number in the data page.

Clustered Index:
The ordering of the physical data rows is in accordance with the index blocks that point to them. Therefore, only one clustered index can be created on a given table.

35

UID:
12 15

Username:

...

Bo Jon

...

10 11

...

13 14

...

16 17

... ... ... ...


Age:
20 26

Ad Bl

...

Cl Jan

...

Sm Ste

... ... ... ...

...
Clustered

14 19

...

23 26

...

35 48

... ... ... ...

Non-clustered

Memory Disk
12 | Jon- 15 | Ada-

...

10 | Bob 11 | Smi-

...

13 | Bla- 14 | Cla-

...
36

16 | Jan- 17 | Ste-

...

...

...

...

Clustered vs. Non-clustered Indexes



Each table must have one (and only one) clustered index (typically primary key) Each table can have any number of non-clustered indexes

37

Reverse Index

38

Reverse Index
Sample Query: Find all users with aut.ac.ir email account

38

Reverse Index
Sample Query: Find all users with aut.ac.ir email account
CREATE INDEX ON email SELECT name FROM users WHERE email LIKE %@aut.ac.ir
38

Reverse Index
Sample Query: Find all users with aut.ac.ir email account
CREATE INDEX ON reverse(email) SELECT name FROM users WHERE reverse(email) LIKE reverse(%@aut.ac.ir)
39

Bitmap Index
Suitable for:

Categorical elds, with a small number of


possible values (age group, account type)

Boolean elds (gender)

40

Bitmap Index
Gender (Female): 0111011000101010101 Age Group (20-30): 0110010000100000101 Age Group (30-40): 0001000111000000010
Female, Age between 20 and 40: 0111010000100000101

41

Tradeoffs In Using Indices

42

Tradeoffs In Using Indices


Good:

42

Tradeoffs In Using Indices


Good:

Much faster search (on indexed elds)

42

Tradeoffs In Using Indices


Good:

Much faster search (on indexed elds) Range queries

42

Tradeoffs In Using Indices


Good:


Bad:

Much faster search (on indexed elds) Range queries

42

Tradeoffs In Using Indices


Good:


Bad:

Much faster search (on indexed elds) Range queries Slower insert/update/delete

42

Tradeoffs In Using Indices


Good:


Bad:

Much faster search (on indexed elds) Range queries Slower insert/update/delete Space overhead
42

Tradeoffs In Using Indices


Number of transactions per second:

43

Tradeoffs In Using Indices


Number of transactions per second:
No indices 1 Index 2 indices 3 indices 4 indices

43

Tradeoffs In Using Indices


Number of transactions per second:
No indices 1 Index 2 indices 3 indices 4 indices
100

43

Tradeoffs In Using Indices


Number of transactions per second:
No indices 1 Index 2 indices 3 indices 4 indices
65 100

43

Tradeoffs In Using Indices


Number of transactions per second:
No indices 1 Index 2 indices 3 indices 4 indices
22 65 100

43

Tradeoffs In Using Indices


Number of transactions per second:
No indices 1 Index 2 indices 3 indices 4 indices
15 22 65 100

43

Tradeoffs In Using Indices


Number of transactions per second:
No indices 1 Index 2 indices 3 indices 4 indices
5 15 22 65 100

43

Indices in MySQL
CREATE INDEX IX_users_username ON users (username);

SELECT * FROM users WITH(INDEX(IX_users_username)) WHERE username = Jane

DROP INDEX users.IX_users_username;

44

Indices in Microsoft SQL Server


CREATE [NONCLUSTERED] INDEX IX_users_username ON users (username); GO SELECT * FROM users WITH(INDEX(IX_users_username)) WHERE username = Jane

DROP INDEX users.IX_users_username GO

45

Questions

46

Questions
Storage, B+ tree, ...

46

Questions
Storage, B+ tree, ... Indices

46

Questions
Storage, B+ tree, ... Indices Project 1: MySQL

46

Questions
Storage, B+ tree, ... Indices Project 1: MySQL Project 2: MongoDB
46

Questions
Storage, B+ tree, ... Indices Project 1: MySQL Project 2: MongoDB Optional Project
46

S-ar putea să vă placă și