Sunteți pe pagina 1din 12

Enyo Ahovi - s1702890

r.e.ahovi@student.utwente.nl
Juliette Hoedemakers - s1423592
j.m.hoedemakers@student.utwente.nl
Assignment - Part 1: SQL querying

The first part of the Databases assignment consists of writing some SQL queries about movies.
As preparation it is a good idea to look through the tables, attributes and data to get an idea of
how the database is structured. The names are pretty straightforward. The database is named
movies. Click on the database and click Tables to see all tables. You can view the data in a
table with Browse, and by clicking a table name you can see its attributes.

When you have familiarized yourself with the database, you can start with the exercises below.
Note: do not forget to make an SQL script containing all solutions as described in Section 2. The
exercises also indicate how many rows the solution should produce. This can be used to check
if the solution is roughly correct.

3.1 Minimal

Exercise 1 Return all movies from the year 2000 [8 movies].


DONE SELECT DISTINCT name
FROM movies.movie
WHERE year = 2000

Exercise 2 Return the name and year of all movies with the genre Drama (143 movies).
DONE SELECT m.name, m.year
FROM movies.movie m, movies.genre g
WHERE g.mid = m.mid
AND g.genre = Drama

Exercise 3 Give the number of movies per year a [73 rows]


DONE SELECT count(*), m.year
FROM movies.movie m
GROUP BY m.year

Exercise 4 Give all roles that actor Bruce Willis played [4 rows]
DONE SELECT a.role
FROM movies.person p, movies.acts a
WHERE p.pid = a.pid
AND p.name = Bruce Willis

Exercise 5 Give the number of persons in the database [1 row with the number 3279]
DONE SELECT count(*)
FROM movies.person p
Exercise 6 Give name and year of recording for every movie with a rating between 8.7
and 9.0 [19 rows] P
DONE SELECT DISTINCT year, rating, name
FROM movies.movie
WHERE rating > 8.6
AND rating < 9.1

Exercise 7 Give per movie, its name and how many actors play in that movie (251 rows)
DONE SELECT DISTINCT count(*), m.name, m.mid
FROM movies.person p, movies.acts a, movies.movie m
WHERE p.pid = a.pid
AND a.mid = m.mid
GROUP BY m.name, m.mid

Exercise 8 Give the names of all movies which have a role name starting with Dr. [39
rows] (Hint: If you formulate the query like SELECT m.name FROM ..., it could happen that the
same movie title appears multiple times in the answer table (why?). SQL offers the
following construct to eliminate duplicate rows in a table: S ELECT DISTINCT ... FROM ...)
DONE SELECT DISTINCT m.name
FROM movies.movie m, movies.acts a
WHERE m.mid = a.mid
AND a.role LIKE Dr.%

Exercise 9 Give the number of actors [1 row with value 2867]


DONE CREATE TABLE movies.actors (
pid integer PRIMARY KEY);
INSERT INTO movies.actors
SELECT a.pid
FROM movies.acts a
INTERSECT
SELECT p.pid
FROM movies.person p

SELECT count(*)
FROM movies.actors

Exercise 10 Give in a sorted list of languages, per language how many movies are there
in that language [20 rows]
DONE SELECT DISTINCT count(*), l.language
From movies.language l, movies.movie m
WHERE l.mid = m.mid
GROUP BY l.language
Assignment - Part 2: Cube modeling, database creation, and data
manipulation

This part of the assignment is about modeling cubes and realizing them in the database. We
follow the steps below

1. Model star schema of cube


2. Design table structure for the star schema
3. Create the (empty) tables in the database
4. Fill the tables with transformed data from the sources

The minimal exercises concentrate on the last two steps; the regular exercises also include the
sec- ond step; and the advanced exercises also include modeling of a star schema from a case
description.

4.1 Minimal

The goal we set for the minimal exercises is to obtain a visualization surrounding teachers and
courses, how many students they teach / follow these courses. We are also interested in trends
over time, granularity quartile.

For each of the dimensions, we determine a proper primary key (keys in bold). For Teacher it is
teacher id; for Course it is c ourse code; and for Quarter it is the combination of y ear and q
uarter.
Additionally, we include the names of the teachers and courses as informational attributes as
well as the start and end months of the quarters. The fact is n rofstudents and we need foreign keys
to each of the dimension tables in the fact table. The meaning of one row in the fact table is the
number of students per teacher, per course, and per quarter (step 2).

Exercise 19 Step 3: In your database, there is already an empty schema with the same
name as your database (i.e., ddann). Create empty tables for this table structure in this
schema using SQL, i.e., give a C REATE TABLE statement for each table.

Note: The Paginate results option of PhpPgAdmin does not work together with CREATE
TABLE statements. Simply switch it off.

Teacher:
DONE CREATE TABLE dda06.teacher (
teacher_id integer PRIMARY KEY,
teacher CHARACTER VARYING
);

Course:
DONE CREATE TABLE dda06.course (
course_code integer PRIMARY KEY,
course CHARACTER VARYING,
description CHARACTER VARYING
);

Quarter:
DONE CREATE TABLE dda06.quarter (
quarter INTEGER,
year INTEGER,
PRIMARY KEY (quarter, year),
start_month CHARACTER VARYING,
end_month CHARACTER VARYING
);

Students:
DONE ??
CREATE TABLE dda06.students (
teacher_id INTEGER,
course_code INTEGER,
quarter INTEGER,
year INTEGER,
nr_of_students INTEGER,
PRIMARY KEY (teacher_id, course_code, quarter, year),
FOREIGN KEY (teacher_id)
REFERENCES teacher(teacher_id),
FOREIGN KEY (course_code)
REFERENCES course(course_code),
FOREIGN KEY (quarter, year)
REFERENCES quarter(quarter, year)
);
??

//
CREATE TABLE dda06.studentsdump (
teacher_id INTEGER,
course_code INTEGER,
quarter INTEGER,
year INTEGER,
nr_of_students INTEGER,
PRIMARY KEY (course_code, quarter, year)
);
//

Exercise 20 Step 4: The source data for your cube resides in the three tables of the srs
schema. As always, it is not in the shape you want (the table structure youve just
created). Write SQL statements for filling your empty tables with data from the srs
schema. These SQL statements in fact do the transformation and store the transformed
data into the newly created tables in one go.

Note: the start and end months of the quarters is not in the source data. Just take
September for quarter 1, November for quarter 2, etc.

Teacher:
DONE INSERT INTO dda06.teacher

SELECT DISTINCT teacher_id, teacher


FROM srs.education

Courses:
DONE INSERT INTO dda06.course

SELECT course_code, course, description


FROM srs.courses

Quarter:
DONE INSERT INTO dda06.quarter
SELECT quarter, year
FROM srs.education

UPDATE dda06.quarter
SET start_month = September
WHERE quarter = 1

UPDATE dda06.quarter
SET start_month = November
WHERE quarter = 2

UPDATE dda06.quarter
SET start_month = February
WHERE quarter = 3
UPDATE dda06.quarter
SET start_month = April
WHERE quarter = 4

UPDATE dda06.quarter
SET end_month = November
WHERE quarter = 1

UPDATE dda06.quarter
SET end_month = February
WHERE quarter = 2

UPDATE dda06.quarter
SET end_month = April
WHERE quarter = 3

UPDATE dda06.quarter
SET end_month = July
WHERE quarter = 4

Students: we used a new table studentsdump for this exercise where teacher_id is
DONE not a primary variable

//
INSERT INTO dda06.studentsdump (nr_of_students, course_code, quarter, year)
SELECT count(g.student), g.course_code, g.quarter, g.year
FROM srs.grades g
GROUP BY g.course_code, g.quarter, g.year
ORDER BY g.course_code

UPDATE dda06.studentsdump
SET teacher_id = e.teacher_id
FROM srs.education e
WHERE e.course_code = studentsdump.course_code
//
Figure 1: Overview folders
Figure 2: Course folder
Figure 3 : Quarter folder
Figure 4: Studentsdump folder
Figure 5 : Teacher folder

S-ar putea să vă placă și