Documente Academic
Documente Profesional
Documente Cultură
A set of programs to access these data Read, add, delete, update records or parts of records Often invisible to end-users of a database
Airline: List and status of airplanes, pilots, support personnel List and status of flights
Hospital: List of doctors, their specialization, their availability List of patients and their health history
of movies/TV shows that are available Netflix recommends movies/TV shows based on history of usage by a person TASK: Lets come up with a list of information that needs to be stored
Ignore HR department and personnel working for Netflix
Advantages of DB
Can store large amount of information in a very organized
way. Data redundancy and inconsistency can be removed. Easy to access data. All data is at the same place. Integrity (ie, consistency constraints can be imposed). Atomicity (ie, multiple operations are considered as one). Concurrent access is not a problem. No security problems (ie, different types of users can be given access to different data).
DB is a set of tables
User table:
Userna me
Jsmith3
Credit card
xxxxx
Active account
Y
History
Recomm endation s
Law and Order
Mrayn
Pass2
xxxxx
gomezs
Pass3
xxxxx
DB is a set of tables
User table:
Userna me
Jsmith3
Credit card
xxxxx
Active account
Y
History
Recomm endation s
Law and Order
Mrayn
Pass2
xxxxx
gomezs
Pass3
xxxxx
Any issues/concerns/difficulties?
DB is a set of tables
User table:
Userna me
Jsmith3
Credit card
xxxxx
Active account
Y
History
Recomm endation s
Law and Order
Mrayn
Pass2
xxxxx
gomezs
Pass3
xxxxx
DB is a set of tables
User table:
Userna me
Jsmith3
Credit card
xxxxx
Active account
Y
History
Recomm endation s
Law and Order
Mrayn
Pass2
xxxxx
gomezs
Pass3
xxxxx
DB is a set of tables
User table:
Userna me
Jsmith3
Credit card
xxxxx
Active account
Y
History
Recomm endation s
Law and Order
Mrayn
Pass2
xxxxx
gomezs
Pass3
xxxxx
DB is a set of tables
User table:
Userna me
Jsmith3
Credit card
xxxxx
Active account
Y
History
Recomm endation s
Law and Order
Mrayn
Pass2
xxxxx
gomezs
Pass3
xxxxx
DB is a set of tables
User table:
Userna me
Jsmith3
Credit card
xxxxx
Active account
Y
History
Recomm endation s
Law and Order
Mrayn
Pass2
xxxxx
gomezs
Pass3
xxxxx
account? -> Different fields for bank account -> new table.
DB is a set of tables
User table:
Userna me
Jsmith3
Payment
Active account
Y
History
Recomm endation s
Law and Order
Mrayn
Pass2
gomezs
Pass3
Payment tables
Credit card:
Credit card number 98674325543212 23 Expiration date 12/2017 Security number 543
Bank account:
Routing number 1231231231 4343434343 Account number 9765432134 7543009990
Payment tables
Credit card:
Credit card number 98674325543212 23 Expiration date 12/2017 Security number 543
Bank account:
Routing number 1231231231 4343434343 Account number 9765432134 7543009990
Any issues???
Payment tables
Credit card:
Username Jsmith3 Credit card number 9867432554 321223 Expiration date 12/2017 Security number 543
Bank account:
Username Mrayn gomezs Routing number 1231231231 4343434343 Account number 9765432134 7543009990
Payment tables
Credit card:
Username Jsmith3 Credit card number 9867432554 321223 Expiration date 12/2017 Security number 543
Bank account:
Username Mrayn gomezs Routing number 1231231231 4343434343 Account number 9765432134 7543009990
DB is a set of tables
User table:
Userna me
Jsmith3
Credit card
xxxxx
Active account
Y
History
Recomm endation s
Law and Order
Mrayn
Pass2
xxxxx
gomezs
Pass3
xxxxx
History table
Username jsmith3 jsmith3 Mrayn Mrayn Show 12 Years of Slave NCIS Hobbit Hunger Games
DB is a set of tables
User table:
Userna me
Jsmith3
Credit card
xxxxx
Active account
Y
History
Recomm endation s
Law and Order
Mrayn
Pass2
xxxxx
gomezs
Pass3
xxxxx
Advantages of DB
Can store large amount of information in a very organized
way. Data redundancy and inconsistency can be removed. Easy to access data. All data is at the same place. Integrity (ie, consistency constraints can be imposed). Atomicity (ie, multiple operations are considered as one). Concurrent access is not a problem. No security problems (ie, different types of users can be given access to different data).
Advantages of DB
Can store large amount of information in a very organized
way
There is no limit on number of tables in a DB. There is no limit on number of columns and rows in a DB. We can store large amount of data.
Advantages of DB
Data redundancy and inconsistency can be removed. Data redundancy = store same information in multiple places.
Eg., we have a table for movies and a table for TV shows, and for each
of them, we have separate list of customers (username, password, first name, last name, etc.) this is unnecessary redundancy bad DB design
Inconsistency = pieces of data do not agree.
What if a person changes her name? Updates might have to be made in
both lists of customers, but the customer might remember to do it only at one place. Creates inconsistency.
Advantages of DB
Easy to access data. It is easy to search data by any value in any column in any table (if DB is designed well).
Eg., A person forgot username and password, it is easy to search by
first name and last name columns (at the same time), and then ask for additional info (address, email) to verify the correct person is found or to filter out wrong records if multiple records have same first and last names. We can search for all customers who saw a particular movie (this is another reason why we might want to have history as a separate table and each movie seen by a person on a separate line). We can search for all customers who have had membership for more than one year. Or can we?
Advantages of DB
All data is at the same place. There are no different files or folders to search. It seems as everything is stored together.
Advantages of DB
Integrity (ie, consistency constraints can be imposed). Eg., username should satisfy certain constraints (at least 6 characters long, must be unique). It is easy to impose these constraints in a DB. It is programmer once even though username might be used in multiple tables.
Advantages of DB
Atomicity = multiple operations are considered as one When credit card is charged, the customer must be given an active status. What if something happens (ie, system failure) between charging credit card and setting customers status to active?
Advantages of DB
Concurrent access is not a problem. Multiple users may access DB at the same time.
Advantages of DB
No security problems (ie, different types of users can be
but they can search this table. Eg., On the other hand, Netflix workers can update movies table.
DB terminology
DB schema = DB design
DB instance = a collection of information stored in DB at
the moment
Data-definition language (DDL) = language used to
specify DB schema Data-manipulation language (DML) = language used to manipulate (add, delete, update) information in DB
It changes DB instance
DB schema
How many tables? Each table has a name. It represent an object. How many columns in each table? Each column is an attribute of the object it represents. Each column saves data of a particular type: char(50), bigint(30), date How do we differentiate each record in a table? By a unique key. DB schema defines a key for each table. There might be foreign keys present in some tables.
Data-definition language
Eg.,
CREATE TABLE user (username VARCHAR(20) NOT NULL, first_name VARCHAR(15) NOT NULL, last_name VARCHAR(15) NOT NULL, account_active CHAR(1), date_enrolled DATE, number_movies_watched BIGINT(30), PRIMARY KEY username);
SQL (Structured Query Language) for DDL
Data-definition language
Defines/ensures: Domain constraints
Must exist for each attribute Eg., bigint, varchar
Referential integrity Eg., Each movie that show in a users history mist exist in movies table.
Assertions
Any domain constraint is an assertion Any referential integrity is an assertion Other assertions exist Eg., password must satisfy some criteria (at least 6 characters long, contain at least one letter and one digit)
Authorization Read authorization in movies table for users Update authorization in movies for administrators
Data-manipulation language
SQL
Used to Retrieve data from DB Insert new info in DB Delete info from DB Modify a piece of info in DB
Eg.,
DB design
DB has lots of advantage, but it has to be designed well to use
Conceptual design
Describe data and relationships among pieces of information. Types of data (char, int, date) Can one user have multiple movies listed in history? Should history be a table on its own?
Physical design
Translate conceptual design into DB
Understanding requirements
Lots of conversations between DB developers (ie., tech
people) and business people of the company or endusers. Usually several iterations happening between all parties involved to clarify details. Some of them happen after conceptual design has already started.
Conceptual design
The most important and hardest part of DB design.
E-R model
Graphical representation of DB model. Unified Modeling Language (UML) is most commonly used for this representation. Entity = an object. Represented by a rectangle. It has entity name (on top) and a set of attributes. Relationship = relationship between two entities. Represented by a diamond. Relationship name is written inside the diamond. It connects two entities. It might contain cardinalities.
E-R model
user Username Password First name Last name Address Email Active account 1 Belongs to N History Movie title Date watched N
Used by
Normalization
A set of algorithms to make sure that data are represented
a separate table for TV show usename Jsmith3 title NCIA actors MH,PP,SS director Bellisario genre drama
but we would repeat all this info for every person who watches NCIS. What if director changes at some moment or a new actor is added?
Normalization
Normal form (cont): No inability to represent information
In a previous example (if we didnt have separate table for TV shows),
what would happen if there is a TV show that noone has watched yet (eg., a new TV show)? The info would not be able to be stored anywhere.
Physical design
Pretty simple for a DB expert.
DB language.
Nave (non-tech) end-users (eg., Netflix customers) They search DB They use application programing interface to interact with DB
Often via web
Application programmers They create/program application programming interface for nave end-user
Types of DBs
Relational DB
Object-based data models Developed to suit better object-oriented programming languages. Extension of E-R model to allow for structured and collection types, encapsulation, inheritance Semistructured data models Allows specification of data where a data item might have different set of attributes for different records. XML language