Sunteți pe pagina 1din 17

197 A Relational Database Primer

Robert G. Brookshire

For most social scientists, data come m a rectangular form similar to a spreadsheet,
where columns represent variables and rows are observations. For a few years now,
however, a more complex view of data has been evolving in the fields of computer
science and management information systems. The purposes of this paper are to in-

troduce this relational view of data to social scientists and to argue that this way of
looking at data can be much more powerful than the traditional view. The first part
of the paper introduces the terminology and concepts of the relational model. This
is followed by a discussion of relational operators, normalization, and the entity-

relationship diagram—a technique used to visualize a relational database. The paper


then illustrates these concepts with crime and justice data from the Bureau of Justice
: database, data management, relational database.
Statistics. Keywords

Social scientists have traditionally conceived of data as a rectan-


gular matrix of values, with each row corresponding to a distinct
object of study and the columns containing the characteristics of
the objects. This view of data grew quite naturally from the historic
card file images of data imposed by unit record equipment, and later
by computer hardware and statistical packages. In the fields of com-
puter science and management information systems, however,
more complex possibilities for the representation and storage of
data have been developed since the 1970s. One of these approaches,
the relational model, can offer social scientists a much more pow-
erful way of approaching data than the traditional, spreadsheet-like
view.
There are several reasons why social scientists should be inter-
ested in learning about relational database concepts. First, it is im-
portant that data analysts be familiar with modern database termi-
nology and ideas, just as they should stay abreast of developments
in statistical methods. Database methods are part of the suite of

Social Science Computer Review 11:2, Summer 1993. Copyright CO 1993 by Duke Um-
versity Press. ccc o894-4393~93~$1~50~
198 tools that researchers should maintain. Second, users of data ought
to be able to communicate with the designers and managers of data-
bases, the people who are custodians of the data files. Having a com-
mon set of concepts and language promotes the good working rela-

tionships that are necessary for ongoing research. Third, for many
kinds of data that social scientists must analyze, the traditional flat
file is inadequate. Data that encompass several different levels or
units of analysis, or contain time-dependent measures of varying
length, can be represented only very clumsily with rectangular data
structures. The relational model can simplify the storage and re-
trieval of these kinds of data. Finally, the relational model is a
somewhat different way of looking at data, one that can lead to
more comprehensive, flexible, and insightful kinds of data analysis.

Relational Database Concepts


The relational model for data was devised by E. F. Codd (Codd,
1970), and further refined by him (Codd, 1979, 1982, 1990) and others,
especially Codds colleague C. J. Date (Date, 1986). Although ini-
tially received with some resistance, the relational model has come
to be the ideal for database management systems. Many leading
commercial software packages for data management claim to fol-
low this model, but most fail to achieve complete adherence
(Vaughan-Nichols, 1990).
One of the distinguishing features of the relational model is its
theoretical basis. Grounded in set theory, the relational model uses
first-order predicate logic, which Codd claims is more powerful
than traditional propositional logic (Codd, 1990, p. 20). The model is
thus amenable to formal specification (see Codd, 1970, 1979). An ex-
ample of the mathematical basis of the relational model is the def-
inition of the relation itself. Codd states that a relation in mathe-
matics is defined as follows: Given sets Si, S2, ... , Sn (not
necessarily distinct), R is a relation on these n sets if it is a set of n-
tuples, the first component of which is drawn from Si, the second
component from S2, and so on (Codd, 1990, p. 1).
A table of data in the relational model corresponds to the mathe-
matical definition in that each row of data is a tuple of R, there is
no necessary order to the rows, and all rows are distinct from each
other (Codd, 1990, p. 2). The columns in the table correspond to the
components of R. A relation is of degree n, corresponding to the
number of columns in the table. The set of possible values of each
column is called the domain.
As an example, a relational database may contain a table of infor-
mation about members of Congress. Each row in the table, or tuple,
would correspond to the set of data about that member, and each
column would contain the attributes (name, party affiliation, con-
stituency, and so on) that apply to that member. If 15 different at-
199 tributes were stored in the table for each member, the table would
be of degree 15.
This example does not depart substantially from the traditional
record-oriented view of data familiar to most social scientists. This
example, however, shows only the first facet of the relational
model. A relational database will almost always contain many ta-
bles, not just one. These tables of stored data, called base relations,
can then be combined in various ways to form derived relations. It
is this ability to form derived relations, or views, that is the power
of the relational model.
The congressional database described above might, for instance,
have an additional table, or base relation, with information about
congressional committees. Each row in the table might correspond
to a committee, with the attributes consisting of the name of the
committee, the chairs name, and other committee characteristics.
The database could contain a third table with information about
committee staff members, including the attributes of names, the
committee on which each serves, and other characteristics of the
staff.
From these three base relations, derived relations could then be
formed. For example, a member-committee relation could be
formed that gave a list of members and the committees on which
they serve. Another derived relation, the committee-staff relation,
might give the name, phone number, and office number of each
staff member for each committee; and a third table, the member-
staff relation, would show the members of Congress that each staff
member serves through the committee linkage.
The ability to form derived relations is achieved through the
maintenance of keys in each base relation. Keys are of two main
types, primary and foreign. A primary key is a column or set of col-
umns in a table whose values serve to identify uniquely each row
of the relation. In a survey data table, a respondent ID number may
serve as the primary key. In the relation containing information
about members of Congress, the primary key might be composed of
several columns: first and last names, for example; or first name,
last name, district number, and state if two members have the same
first and last names. In a relational database, every row in every
table must have a primary key.
A foreign key is a column or set of columns contained in one ta-
ble that is the primary key in a different table. If one of the attri-
butes stored in a table containing information about members of
Congress is the name of a committee on which the member serves,
and the committee name is used as the primary key in the table that
contained data on committees, then in the member relation, com-
mittee name is a foreign key. A foreign key is always found as a
primary key in some table in the database, but a primary key could
exist that is not also a foreign key.
200 The first-order predicate logic the relational model employs has
four truth values, TRUE, FALSE, MAYBE-APPLICABLE and MAYBE-
INAPPLICABLE. The MAYBE values are used to distinguish between
two kinds of missing values. A value that is MAYBE-APPLICABLE is
missing simply because it is unknown (e.g., the salaries of survey
respondents who refused to report their salaries). A value that is
MAYBE-INAPPLICABLE is missing because the value is not applicable
(e.g., the salary of an unemployed respondent).

Relational Operators
Tables in relational database are manipulated through the use of
a

operators. Codd (1990) specifies several dozen operators and delib-


erately leaves the list open-ended. He intends that others define
new operators as the need arises. Codd divides the operators into
two sets, basic and advanced. Only the basic operators are discussed
here, as examples of the types of operators included in the relational
model.
Project. The project operator subsets the columns of a table. Con-
sider the following relation, called President:
F_Name L-Name State B-Term
Theodore Roosevelt New York 1901
Woodrow Wilson New Jersey 1913
Herbert Hoover California 1929
Franklin Roosevelt New York 1933
Millard Fillmore New York 1850
The operation project Presidents-Name, State) yields the relation:
L-Name State
Roosevelt New York
Wilson New Jersey
Hoover California
Fillmore New York

which has one less row than President because a relation may not
have duplicate rows. Similarly, the operation project Presi-
dent(State) would yield:
State
New York
New Jersey
California
Select. The select operator, also called theta-select, is used to sub-
set the rows of a table according to an equality or inequality condi-
tion. The operation select President(B_Term < 1920) yields:
201 F-Name L-Name State B-Term
Theodore Roosevelt New York 1901
Woodrow Wilson New Jersey 1913
Millard Fillmore New York 1850
join. The join operator, also called theta-join, is used to join the
rows of two relations according to an equality or inequality condi-
tion. If we have a relation called States, as in the following:

S-Name Admitted
New York 1788
New Jersey 1787
Arizona 1912
Hawaii 1959
the operation join President(State =
S_Name)States would result
in:

F_Name L-Name State B-Term S-Name Admitted


Theodore Roosevelt New York 1901 New York 1788
Woodrow Wilson New Jersey 1913 New Jersey 1787
Franklin Roosevelt New York 1933 New York 1788
Millard Fillmore New York 1850 New York 1788
A case of join, called natural join, achieves the same result
special
as the above but drops the redundant column S-Name.
The operation join President(B-Term > Admitted)States yields
the list of states admitted before the beginning of each presidents
term:

F-Name L-Name State B-Term S-Name Admitted


Theodore Roosevelt New York 1901 New York 1788
Theodore Roosevelt New York 1901 New Jersey 1787
Woodrow Wilson New Jersey 1913 New York 1788
Woodrow Wilson New Jersey 1913 New Jersey 1787
Woodrow Wilson New Jersey 1913 Arizona 1912
Herbert Hoover California 1929 New York 1788
Herbert Hoover California 1929 New Jersey 1787
Herbert Hoover California 1929 Arizona 1912
Franklin Roosevelt New York 1933 New York 1788
Franklin Roosevelt New York 1933 New Jersey 1787
Franklin Roosevelt New York 1933 Arizona 1912
Millard Fillmore New York 1850 New York 1788
Millard Fillmore New York 1850 New Jersey 1787
Union. The union operator concatenates two relations so that the
result contains the rows of both relations. The two relations must
have the same columns. The result will have any duplicate rows
removed, as no relation may contain duplicate rows.
202 Intersection. The intersection operator takes as its operands two
relations with the same columns and yields a relation that contains
only those rows that are common to both.
Difference. The difference operator takes as its operands two re-
lations with the same columns and yields a relation that contains
only the rows of the first relation that are not found in the second
relation.
For an illustration of these three operators, consider the following
two relations:

Relation P Relation VP
F_Name L_Name Born F_Name L-Name Born
George Washington 1732 John Adams 1735
John Adams 1735 Thomas Jefferson 1743
James Madison 1751 Aaron Burr 1756
James Monroe 1758
The union of P and VP would give the relation:
F_Name L_Name Born
George Washington 1732
John Adams 1735
James Madison 1751
James Monroe 1758
Thomas Jefferson 1743
Aaron Burr 1756
Their intersection yields the relation:
F_Name L_Name Born
John Adams 1735
Their difference produces the relation:
F_Name L_Name Born
George Washington 1732
James Madison 1751
James Monroe 1758
Division. Relational division is rather complicated and is best ex-
plained through the use of an example. Consider the following re-
lation, Committee, which shows Senate committees and their
members:
C-Name M_Name M_Rank Party
Judiciary Kennedy 2 D
Armed Services Kennedy 4 D
Small Business Nunn 2 D
Armed Services Nunn i D
Armed Services Warner 1 R
Foreign Relations Kassebaum 3 R
203 We have a second relation, Senators, which has only some names of
senators:

M-Name
Kennedy
Nunn
Warner
The division operation is equivalent to asking the question, &dquo;What
committee contains all the members listed in the relation Sena-
tors?&dquo; The divisor is Senators, the dividend is Committee, and the
result is the quotient:
C-Name M_Rank Party
Armed Services 4 D
Armed Services 1 D
Armed Services 1 R
The remainder of the division operation is what is left of Commit-
tee :

C-Name M_Name M_Rank Party


Judiciary Kennedy 2 D
Small Business Nunn 2 D
Foreign Relations Kassebaum 3 R

Manipulative operators. Several operators are available to manip-


ulate relations. These include assignment, which creates a new re-
lation equivalent to another relation or the result of an operation;
update, which changes the values of data items in a relation; insert,
which inserts new rows into a relation; and delete, which deletes
rows from a relation. Special operators are also available for updat-

ing and deleting keys, because a change to a primary key must also
be reflected in changes to any foreign keys that are equivalent to it.
There are many other operators, the description of which is beyond
the scope of this paper.

Normalization
Normalization is the process of designing a database to eliminate
certain kinds of redundancy in the information that is maintained
in the relations. To this end, certain rules have been defined for sev-
eral &dquo;normal forms&dquo; for relations. These normal forms, from the
simplest to the most complex, are first, second, and third normal
forms; Boyce-Codd normal form; and fourth and fifth normal forms.
Each form adds additional requirements to the one that precedes it.
For example, a relation in second normal form meets all the require-
ments of first normal form, plus some others. A detailed exposition
of these topics with an annotated bibliography can be found in Date
(1986), and a good brief overview is available in Kent (1983).
204 First normal form. First normal form (INF) is the basic form for
relational data. To be in iNF, a relation must have no repeating rows
and each row must have the same number of columns. Most social
science data are in first normal form, although there are a few com-
monly used data sets that are not.
Second and third normal forms. Second and third normal forms
deal with relations in which the primary key is composed of more
than one field. Consider the following relation, Alliances:
Country Alliance Date Capital
United States OAS 1948 Washington, DC
Mexico OAS 1948 Mexico City
United States NATO 1949 Washington, DC
Canada NATO 1949 Ottawa

The primary key for this relation is the combination of the two col-
umns Country and Alliance. The column Capital, however, con-
tains redundant information in that the capital must be repeated for
each occurrence of each country. Second normal form (2NF) removes
this redundancy. To put this data into 2NF, the relation should be
decomposed into two relations:
Country Alliance Date
United States OAS 1948
Mexico OAS 1948
United States NATO 1949
Canada NATO 1949
and
Country Capital
United States Washington, DC
Mexico Mexico City
Canada Ottawa
More formally, in only columns that contain information about
2NF
the entity defined in the key should be contained in the relation.
Because Capital provides information only about the country, not
the country and its alliance, it should be stored in another relation.
In 3NF, this requirement is extended to include columns that con-
tain information about nonkey data as well. Consider the following
relation, which contains information about senators:
Senator Party State Capital
Kennedy D Massachusetts Boston
Kerry D Massachusetts Boston
Nunn D Georgia Atlanta
Dole R Kansas Topeka
Kassebaum R Kansas Topeka
Warner R Virginia Richmond
205 Because the relation contains data about senators, the key is the
column Senator. The column Capital, however, does not provide in-
formation about the senators, but about the states they represent.
It is redundant to repeat the capital for each occurrence of state.
To put this relation into 3NF, it should be decomposed into two
relations:
Senator Party State
Kennedy D Massachusetts
Kerry D Massachusetts
Nunn D Georgia
Dole R Kansas
Kassebaum R Kansas
Warner R Virginia
and
State Capital
Massachusetts Boston
Georgia Atlanta
Kansas Topeka
Virginia Richmond
In order to be in 3NF, then, each column in each row must &dquo;pro-
vide afact about the key, the whole key, and nothing but the key&dquo;
(Kent, 1983, p. 120). The truly devout add to this definition, &dquo;so help
me, Codd.&dquo; For both 2NF and 3NF, if the original relation is required
for analysis, it can be easily reconstituted by using the relational
operators on the two new relations.
Boyce-Codd normal form. Boyce-Codd normal form (BCNF) is an
extension of 3NF. Consider the relation:
Senator Committee Chairman
Kennedy Judiciary Biden
Kennedy Armed Services Nunn
Warner Armed Services Nunn
Kassebaum Foreign Relations Pell
This relation has two possible keys, the combination of Senator and
Committee, and the combination of Committee and Chairman. If
we choose one of these combinations as the primary key, we are

preserving redundant information in the remaining column. This


relation is in 3NF, but must be decomposed as follows to be in BCNF:
Senator Committee
Kennedy Judiciary
Kennedy Armed Services
Warner Armed Services
Kassebaum Foreign Relations
and
206 Committee Chairman
Judiciary Biden
Armed Services Nunn
Foreign Relations Pell
Fourth normal form. Suppose in a database about countries we
want to keep information about the alliances to which the coun-
tries belong and the official languages in those countries. Each
country may have none, one, or many alliances, and each may also
have one or more languages. Storing this data is problematic:
Country Alliance Language
Canada NATO English
Canada OECD French
Canada British Commonwealth
Switzerland OECD French
Switzerland EFTA Italian
Switzerland German
United Kingdom British Commonwealth English
United Kingdom EEC
United Kingdom NATO
United Kingdom OECD

Theses kinds of data are called multivalued dependencies, because


each column refers to (depends on) the primary key, Country, and
can have more than one value. Fourth normal form (4NF) was de-
vised to handle this kind of problem. It solves it by decomposing
the relation into two relations:
Country Alliance
Canada NATO
Canada OECD
Canada British Commonwealth
Switzerland OECD
Switzerland EFTA
United Kingdom British Commonwealth
United Kingdom EEC
United Kingdom NATO
United Kingdom OECD

and
Country Language
Canada English
Canada French
Switzerland French
Switzerland Italian
Switzerland German
United Kingdom English
207 Note that the blank entries in the relations are eliminated when the
data are transformed to 4NF. As with all the previous examples, we
can recover the original relations, if necessary, through operation on
the relations in 4NF.
Fifth normal form. Consider the relationships between importers
and exporters of agricultural products. A country may export a prod-
uct to one or more countries. A country may import a product from
one or more countries. Countries may import or export more than
one product, and many countries can import or export the same

products. A relation to show this activity might be of the form:


Producer Product Importer
United States wheat Russia
United States wheat China
United States corn China
Australia corn -

Australia rice Russia


France wheat Russia
France wheat Japan
This relation is in 4NF, but obviously contains a lot of redundancy.
Unlike the previous examples in which the original must be decom-
posed into two relations, a relation in fifth normal form (5NF) must
be decomposed into three relations. First,
Producer Product
United States wheat
United States corn
Australia corn
Australia rice
France wheat

Second,
Producer Importer
United States Russia
United States China
Australia Russia
France Russia
France Japan
Third,
Importer Product
Russia wheat
China wheat
Russia rice
Japan wheat
208 As with the other normal form relations, the original relation can
be recovered through operations on the decomposed relations. A re-
lation in 5NF is also in INF, 2NF, 3NF, BCNF, and 4NF.

Entity-Relationship Diagrams
As canbe seen from these small examples, the depiction of rela-
tional data can quickly become quite cumbersome. Several tech-
niques have been developed to diagram relational data, the most
popular of which is the entity-relationship diagram (Chen, 1976).
The symbols used in the entity-relationship diagram are not stan-
dardized. This paper employs those presented by Eliason (1990),
which are similar to Chens.
In an entity-relationship diagram (ERD), an entity is anything for
which data are stored. It is symbolized by a box. A relationship be-
tween two entities is symbolized by a diamond that is connected to
the two entities. Labels in the symbols identify the entities and
their relationships. The lines connecting the entities and relations
are labeled with 1, N, or M to indicate the degree of the relationship.
Both N and M indicate more than one and are used to indicate that
the degree of the relationship is not necessarily equal on both sides.
The way these symbols are used will become clear in the example
below.
Each entity and each relationship represents a relation or table.
Chen (1976) called these entity relations and relationship relations.
A list of the columns or attributes of the entity and relationship
relations can be presented in the diagram, with the primary keys
identified by having their names underlined. Figure 1 presents an
example of an ERD.
In Figure 1, each representative serves on more than one (M) com-
mittee, and each committee has more than one representative (N)
serving on it. On the other hand, each representative represents
only one state, whereas a state may have more than one (M) repre-
sentative.
This database will contain four tables or relations, one for each
entity and one for the relationship between Committee and Repre-
sentative. The State relation, whose primary key is S-Name, con-
tains in addition the columns, or attributes, Capital and Population.
The Representative relation, whose primary key is R-Name, also
contains the columns Age, Party, and District and the foreign key
S-Name. The Committee relation, whose primary key is C-Name,
also has the attributes Chair and Mtg-Rm.
The relation Serves On has three columns. Two contain the pri-
mary key, which is composed of R-Name and C-Name. The rela-
tion also has the column Rank. The relation Represents contains
only the key composed of R-Name and S-Name. It is not necessary
to store this relation, because we can project it from Representative.
Through the use of relational operators, we could gather a lot of
209 R_Name, C_Name,
Rank

Figure i Illustration of entity-relationship diagram


material from this database. The join and project operators could
give us a list of the name, party, rank, and age of each member in
each committee. A similar operation, along with a select operator,
could give us a list of the representatives from the most popular
states. A more complicated set of operations could provide the
names of the chairs of the committees on which representatives
from New York serve, and so on.
In designing a database like this one, a process called data mod-
eling, the data analyst must keep in mind the requirements of nor-
malization. Although there is no law that mandates normalization
of a database, it often saves disk space to store data in normalized
relations. On the other hand, if it is more efficient regarding pro-
cessing to store data in less-than-optimal normal form, the analyst
can choose to store data that way.
In the construction of an ERD, some rules of thumb can be used
to give guidance in designing normalized relations. As Eliason (1990)
points out, if a relationship between two entities is one to one, a
single physical file can contain all the data. A one-to-many relation-
ship usually requires two physical files for normalized relations,
while a many-to-many relationship needs three physical files. Fol-
lowing these basic rules will usually result in relations that are in
at least 3NF.

An Example Database
The National Crime Surveys are conducted by the Bureau of the
Census for the Justice Departments Bureau of Justice Statistics and
contain information about the crimes suffered by households and
210 individuals (U.S. Department of Justice, Bureau of Justice Statistics,
1991). As distributed by the icPSR, the data are composed of three
types of records: data about households, individuals, and criminal
incidents. The data are organized hierarchically. Each household
record is associated (by a common variable) with one or more indi-
vidual records that describe the persons age 12 and older who com-
pose the household. The individual records are likewise associated
with incident records, which describe criminal incidents suffered
by the individuals in the household. Not every individual has an
incident record, but some individuals have many of them.
This hierarchical structure is a reasonably efficient way to store
the data but does not correspond to the logical structure of the
data.l The logical structure of the data is shown in an entity-rela-
tionship diagram in Figure 2. Each household is composed of one or
more individuals. Each individual may have been a victim of one or
more criminal incidents. Some incidents, such as a burglary, are
suffered by the household as a whole, however, rather than by in-
dividuals separately. How should these incidents be treated?
Under the hierarchical model used by the ICPSR, households are
composed of individuals, and individuals suffer criminal incidents.
Every incident, then, has to be tied to a single, specific individual.
For crimes against individuals, such as assault, if there is more than
one victim in a particular incident, there will be a separate incident
record for each victim. Crimes against households, however, are
represented by a single individual record, which is associated either
with the main respondent for the household or with the individual
who reports it.
The relational model, in contrast, allows the household to relate
directly to the incident without the intervention of the individual,

Figure 2 Logical structure of National Crime Survey data


211 thus corresponding to the logical structure of the data. This would
allow the incident record for households to have a different struc-
ture and different variables from those of the incident record for in-
dividuals, something the hierarchical structure does not permit.
Crimes against households include burglary, household larceny,
and motor vehicle theft; and crimes against individuals include
rape, robbery, assault, and personal larceny. Because these types of
crimes are so different, it might make some sense to gather different
kinds of information about them.
The arrangement of the records in the hierarchical model de-
mands that crimes against households have just one victim, the
household. The person identified on the incident record is merely
the reporter or representative for the household. Shouldnt it at least
sometimes be the case, however, that crimes against households be
considered to have more than one victim? These victims might be
some or all of the persons in the household. The hierarchical struc-
ture that ties each crime against a household to a single individual
does not permit this, but the relational structure, as shown in Fig-
ure 2, does.

Discussion
Thinking about data in a relational form has many benefits. Rather
than forcing all the attributes (variables) in a relation (data set) to be
characteristics of one type of record (case), a relational view of data
allows a database to contain many types of records, some of which
describe entities like voters, countries, senators, and so on, whereas
others describe the relationships among these entities. This concept
of keeping the data that describe the relationships among entities
separate from the data about the entities themselves can be very
liberating and cause us to look at our data in new ways.
The study of multinational corporations involves two entities,
the corporations and the countries in which the corporations do
business. The attributes of countries should obviously be stored in
one relation, and the attributes of the corporations, as multination-

als, should be stored in another. These may include year of incor-


poration, country of incorporation, total assets, and others. Perhaps
the most important data, however, are the data concerning the
country-corporation relation. These data could describe the degree
of participation of the country in ownership of the company in that
country, the number of employees in that country, the value of its
assets in that country, and so on. Through the use of relational op-
erators, derived relations could be developed for analysis to examine
the effects of country and corporate characteristics on the behavior
of corporations in the host country and elsewhere. A relational
structure encourages the analyst to store a wider variety of data
about the objects of study and to consider relationships as objects
of study in and of themselves.
212 Another kind of problem that can be attacked easily with the re-
lational model is one that has several kinds of data that are inter-
related. The problem of storing data about the U.S. Supreme Court
is one example. Justices have attributes such as political party, law
school attended, and so on. Justices belong to natural courts, which
have attributes such as Chief Justice, beginning and ending dates,
and others. Justices write opinions that have their own characteris-
tics, and opinions concern cases. An opinion pertains to only one
case, whereas a case may have many opinions. Cases are decided by
natural courts, with each court deciding many cases, but a case is
decided by only one court. Cases have their own characteristics, in-
cluding facts, type, the names of the parties, and many others. A
relational design is well suited to storing these kinds of data,
whereas the typical flat file structure is not.

Conclusion
This paper has only scratched the surface of the relational model of
data. It has introduced the major concepts of the model, however.
This should enable readers to communicate clearly with database
managers and other computer professionals. It has also provided the
tools so that readers can design reasonably complex relation data-
bases of their own. Finally, I hope that it has suggested ways in
which the relational model can help to change the way we view
data. It is especially important that we learn to include relation-
ships in our view of data, as objects of study that deserve to have,
and can have, data all their own.

Notes
Robert G Brookshire is assistant professor of information and decision sciences at

James Madison University, where he teaches statistics, computer applications, and


systems analysis and design. He is the editor of the newsletter of the Computer and
Multimedia Section of the American Political Science Association. He is a coauthor
of Using Microcomputers for Research (Sage Publications, 1985), and his articles have
appeared in Byte, Social Science Computer Review, Legislative Studies Quarterly,
Evaluation Review, and other Journals. Address: Department of Information and De-
cision Sciences, James Madison University, Harrisonburg, VA 22807, 703-568-3066; FAX

703-568-3299, Internet FAC_RGB@VAX1.ACS.JMU.EDU.


The author extends thanks to the editor and several anonymous reviewers for their
comments, which helped him improve this paper enormously. He alone is responsi-
ble for the defects that remam The National Crime Surveys data referred to in this
paper were made available by the Inter-University Consortium for Political and So-
cial Research. Portions of this paper were presented at the 1991 annual meeting of
the American Political Science Association.
1. It is efficient in the sense that there are no more records stored than are abso-

lutely necessary to contain all the information.

References
Chen, P. P. 1976 The entity-relationship model&mdash;toward a unified view of data. ACM
Transactions on Database Systems 1 (1) 19-36.
213 Codd, E. F. 1970. The relational model of data for large shared data banks. Commu-
nications of the ACM 13 (6). 377-87.
&mdash;.
1979. Extending the relational database model to capture more meaning. ACM
Transactions on Database Systems 4 (4): 457-75.
&mdash;.
1982. Relational database: A practical foundation for productivity. Commu-
nications of the ACM 25 (2). 109-17.
&mdash;.
1990. The relational model for database management: Version . Reading,
2

MA: Addison-Wesley
Date, C. J. 1986. An introduction to database systems. Vol. 1, 4th ed. Reading, MA:
Addison-Wesley.
Eliason, A. L. 1990. Systems development Analysis, design and implementation. 2d
ed. Glenview, IL: Scott, Foresman.
Kent, W. 1983. A simple guide to five normal forms in relational database theory.
Communications of the ACM 26 (2): 120-25.
U.S. Dept. of Justice, Bureau of Justice Statistics. 1991. National crime surveys Na-
tional sample, 1986-1990 (near-term data). 3d ICPSR ed Ann Arbor: Inter-university
Consortium for Political and Social Research.
Vaughan-Nichols, S. J. 1990. Relational databases: The real story. Byte 15 (13): 321-
25.

S-ar putea să vă placă și