Documente Academic
Documente Profesional
Documente Cultură
Many of the actions you make during the day become data for organizations to use for their own profit and learning. Using an automated teller machine, filling out a form for a driver's license, ordering a book on the Internet, booking a flight on an airline - all become digitized data to be sorted, managed, and used by others. In each of these cases, someone at some time has decided how the data from these users will be received, stored, processed, and made available to others. After reading this lesson, you should be able to:
Describe the value of data to organizations. Discuss how and why organizations and individuals attempt to extract meaning from data.
Solutions Scenario
To help ensure "appropriate" customer service, some companies have initiated a grading system for their customers. Customers that make the company a lot of money, or are highly valued customers, receive an A-rating. A B-rating is given to mediocre customers and a C-rating is reserved for customers with bad credit or "bad" dispositions on previous customer service calls. This requires the company to store data on their customers and helps the customer service representatives know which customers are most highly valued and therefore deserve the best service.
Define the key terms needed to understand what a database is and how it is used. Identify the purpose and role of characters in data processing. Identify the purpose and role of fields in data processing. Identify the purpose and role of records in data processing. Identify the purpose and role of database files in data processing. Identify the purpose and role of databases in data processing. Identify the purpose and role of data management systems in data processing.
Character
A character is the most basic element of data that can be observed and manipulated. Behind it are the invisible data elements we call bits and bytes, referring to physical storage elements used by the computer hardware. A character is a single symbol such as a digit, letter, or other special character (e.g., $, #, and ?).
Field
A field contains an item of data; that is, a character, or group of characters that are related. For instance, a grouping of related text characters such as "John Smith" makes up a name in the name field. Let's look at another example. Suppose a political action group advocating gun control in Pennsylvania is compiling the names and addresses of potential supporters for their new mailing list. For each person, they must identify the name, address, city, state, zip code and telephone number. A field would be established for each type of information in the list. The name field would contain all of the letters of the first and last name. The zip code field would hold all of the digits of a person's zip code, and so on. In summary, a field may contain an attribute (e.g., employee salary) or the name of an entity (e.g., person, place, or event).
Record
A record is composed of a group of related fields. As another way of saying it, a record contains a collection of attributes related to an entity such as a person or product. Looking at the list of potential gun control supporters, the name, address, zip code and telephone number of a single individual would constitute a record. A payroll record would contain the name, address, social security number, and title of each employee.
Database File
As we move up the ladder, a database file is defined as a collection of related records. A database file is sometimes called a table. A file may be composed of a complete list of individuals on a mailing list, including their addresses and telephone numbers. Files are frequently categorized by the purpose or application for which they are intended. Some common examples include mailing lists, quality control files, inventory files, or document files. Files may also be classified by the degree of permanence they have. Transition files are only temporary, while master files are much more long-lived.
Database
Organizations and individuals use databases to bring independent sources of data together and store them electronically. Thus, a database is composed of related files that are consolidated, organized and stored together. One collection of related files might pertain to employee information. Another collection of related files might contain sports statistics. Organizations and individuals may have and use many different databases, depending on the nature of the work involved. For example, a library database might consist of several related, but separate, databases including book titles and author names, book description, books on order, books checked out, and similar sets of information. Most organizations have product information databases, customer databases, and human resource databases that contain information about employees, salaries, home address, stock purchase plans, and tax deduction information. In each case, the data stored in a database is independent from the application programs which use and process the data.
Key
In order to track and analyze data effectively, each record requires a unique identifier or what is called a key. The key must be completely unique to a particular record just as each individual has a unique social security number assigned to them. In fact, social security numbers are often used as keys in large databases. You might think that the name field would be a good choice for a key in a mailing list. However, this would not be a good choice because some people might have the same name. A key must be identified or assigned to each record for computerized information processing to function correctly. An existing field may be used if the entries are entirely unique, such as a social security or telephone number. In most cases, a new field will be developed to hold a key, such as a customer number or product number. *A character is the most basic element of data that can be observed and manipulated.
Identify some of the common types of databases. Discuss some of the key issues associated with providing data access. Justify the importance of maintaining separate files. Justify the importance of minimizing redundancy between data files.
Types of Databases
Some databases are small enough to be created and contained on your desktop computer while others are so large that they are stored on network servers or powerful mainframe computers. Popular database management software applications such as Paradox, Access, and dBASE 5 are utilized to manage databases small enough to be stored on a desktop computer. Individuals use these programs to perform specific tasks, such as to keep track of customers and manage data for small research projects. Some databases are so large that that they must be stored on a server or mainframe computer and accessed by going online. Some large, public databases can be accessed online for a fee. These are referred to as information utilities or online services. You may have heard of or used some of the more popular online services including America Online, CompuServe, and Microsoft Network. These online services provide access to a myriad of information sources concerning weather, news, travel, shopping, and a great deal more. Even specialized public databases can be accessed online. Lexis, which gives lawyers access to local, state, and federal laws, is just one example. There are many other types of large databases. Many museums have put artwork online, creating virtual art museums. Most university libraries have created electronic databases to compliment or substitute for their card catalogues.
Database Access
Database access is a sticky issue, as you will see. The following example illustrates some of the difficulties that data administrators, organizations, and society in general now face. A decade ago, Congress created a medical practitioner database to keep physicians disciplined by the medical board of one state from avoiding detection if they moved to another state and applied for a medical license. Should doctor databases be opened to the public? If given access to the database, patients could look up information about a specific doctor and find out if other patents have lodged complaints against them. In one case, a women whose obstetrician left unsightly scars on her abdomen after delivering her baby said if she had been allowed access to the database, she would have learned of other patients' complaints and chosen another doctor. On the other hand, many physicians complain that by making such data available, they are less likely to perform high-risk procedures, even when it might be beneficial to the patient. Those doctors performing high-risk procedures are more likely to receive complaints and could potentially face disciplinary action. You can begin
to see the challenges associated with determining who should have access to certain types of information.
Multiple Sources
A database is more useful if there is little redundancy between the files it contains. In other words, it would be inefficient and a waste of human and computer resources to have the same information repeated over and over again in different files. Some companies maintain databases with very similar information. Sometimes there are good reasons for this; e.g. for security purposes. However, it's simply more costly to maintain accurate information in multiple locations. In addition, there would also be a need to resolve discrepancies occurring between the same information in multiple files. One of the beauties of databases is the ability to link together data from multiple sources to accomplish a specific task. For example, I might store the file containing a mailing list for Pennsylvania with similar lists compiled for individuals in the other fifty states. If a political action group in Pennsylvania decides to develop a campaign for the northeast region, they can extract the names of potential supporters for the states of New York, Connecticut, Maine, and other northeastern states.
*It is important to keep some database files separate, even though they contain closely related information.
Define the term database management system (DBMS). Describe the basic purpose and functions of a DBMS. Discuss the advantages and disadvantages of DBMSs.
DBMS Fundamentals
A database management system is a set of software programs that allows users to create, edit and update data in database files, and store and retrieve data from those database files. Data in a database can be added, deleted, changed, sorted or searched all using a DBMS. If you were an employee in a large organization, the information about you would likely be stored in different files that are linked together. One file about you would pertain to your skills and abilities, another file to your income tax status, another to your home and office address and telephone number, and another to your annual performance ratings. By cross-referencing these files, someone could change a person's address in one file and it would automatically be reflected in all the other files. DBMSs are commonly used to manage:
Membership and subscription mailing lists Accounting and bookkeeping information The data obtained from scientific research Customer information Inventory information Personal records Library information
files. For example, a file management system might be used to store a mailing list or a personal address book. When files need to be linked, a relational database should be created using database application software such as Oracle, Microsoft Access, IBM DB2, or FileMaker Pro.
unauthorized user gets into the database, they have access to all the files, not just a few. Depending on the nature of the data involved, these breaches in security can also pose a threat to individual privacy. Steps should also be taken to regularly make backup copies of the database files and store them because of the possibility of fires and earthquakes that might destroy the system.
*An advantage of major database management systems are that the same information can be made available to different users.
Compare and contrast the structure of different database management systems. Define hierarchical databases. Define network databases. Define relational databases. Define object-oriented databases.
Hierarchical Databases (DBMS), commonly used on mainframe computers, have been around for a long time. It is one of the oldest methods of organizing and storing data, and it is still used by some organizations for making travel reservations. A hierarchical database is organized in pyramid fashion, like the branches of a tree extending downwards. Related fields or records are grouped together so that there are higher-level records and lower-level records, just like the parents in a family tree sit above the subordinated children. Based on this analogy, the parent record at the top of the pyramid is called the root record. A child record always has only one parent record to which it is linked, just like in a normal family tree. In contrast, a parent record may have more than one child record linked to it. Hierarchical databases work by moving from the top down. A record search is conducted by starting at the top of the pyramid and working down through the tree from parent to child until the appropriate child record is found. Furthermore, each child can also be a parent with children underneath it. The advantage of hierarchical databases is that they can be accessed and updated rapidly because the tree-like structure and the relationships between records are defined in advance. However, this feature is a two-edged sword. The disadvantage of this type of database structure is that each child in the tree may have only one parent, and relationships or linkages between children are not permitted, even if they make sense from a logical standpoint. Hierarchical databases are so rigid in their design that adding a new field or record requires that the entire database be redefined.
Network databases are similar to hierarchical databases by also having a hierarchical structure. There are a few key differences, however. Instead of looking like an upside-down tree, a network database looks more like a cobweb or interconnected network of records. In network databases, children are called members and parents are called owners. The most important difference is that each child or member can have more than one parent (or owner). Like hierarchical databases, network databases are principally used on mainframe computers. Since more connections can be made between different types of data, network databases are considered more flexible. However, two limitations must be considered when using this kind of database. Similar to hierarchical databases, network databases must be defined in advance. There is also a limit to the number of connections that can be made between records.
In relational databases, the relationship between data files is relational, not hierarchical. Hierarchical and network databases require the user to pass down through a hierarchy in order to access needed data. Relational databases connect data in different files by using common data elements or a key field. Data in relational databases is stored in different tables, each having a key field that uniquely identifies each row. Relational databases are more flexible than either the hierarchical or network database structures. In relational databases, tables or files filled with data are called relations, tuples designates a row or record, and columns are referred to as attributes or fields. Relational databases work on the principle that each table has a key field that uniquely identifies each row, and that these key fields can be used to connect one table of data to another. Thus, one table might have a row consisting of a customer account number as the key field along with address and telephone number. The customer account number in this table could be linked to another table of data that also includes customer account number (a key field), but in this case, contains information about product returns, including an item number (another key field). This key field can be linked to another table that contains item numbers and other product information such as production location, color, quality control person, and other data. Therefore, using this database, customer information can be linked to specific product information. The relational database has become quite popular for two major reasons. First, relational databases can be used with little or no training. Second, database entries can be modified without redefining the entire structure. The downside of using a relational database is that searching for data can take more time than if other methods are used.
oriented databases are compelling. The ability to mix and match reusable objects provides incredible multimedia capability. Healthcare organizations, for example, can store, track, and recall CAT scans, X-rays, electrocardiograms and many other forms of crucial data.
Define the term query language. Describe an example of a query language. Discuss the functions and capabilities of a query language.
Query Language
Query language allows the user to interact directly with the database software in order to perform information-processing tasks using data in a database. It is usually an easy-to-use computer language that relies on basic words such as SELECT, DELETE, or MODIFY. Using query language and a computer keyboard, the user enters commands that instruct the DBMS to retrieve data from a database or update data in a database. Structured Query Language (SQL) is one type of query language that is widely used to perform operations using relational databases. Remember that relational databases are composed of tables with rows and columns. SQL can be used to retrieve information from related tables in a database or to select and retrieve information from specific rows and columns in one or more tables. One of the keys to understanding how SQL works in a relational database is to realize that each table and column has a specific name associated with it. In order to query a table, the user specifies the name of the table (indicating the rows to be displayed) and the names of the columns to be displayed. A typical SQL query contains three key elements: SELECT (the column names to be displayed) FROM (indicates the table name from which column names will be derived) WHERE (describes the condition for the query)
An Example of SQL
To illustrate the application of this type of query, let's assume a particular user wishes to query a relational database containing information about donors to a charitable organization. If the user wants to know the name and address of all individuals donating $100 or more, the following query might be used: SELECT Name, Address FROM Donor List WHERE DonationAmt > 100 Once this command has been executed, the computer will display a list of donors that meets the predefined criteria. In this case, all of the data are extracted from a single table. Similar queries can be made to extract data from multiple tables. Such a strategy might be used to analyze customer information involving billing data and order data, using two separate tables. In this case, the FROM command would list the names of the two tables involved.
Justify the importance of data security. Describe some of the approaches used to provide database security. Describe the importance of data recovery. Identify some of the methods used to recover lost data.
Database Security
It is usually the responsibility of a database administrator to determine the different access privileges for different users of the system. Most users will be allowed to view and retrieve some types of data and not others. Some users are only allowed to view data in a database, while others who are qualified will be allowed to view and make changes to data in a database. The purpose of determining who has access, as well as the degree of access, is to protect the data from unauthorized use and sabotage. Databases must also be protected physically from harm or accident. Some organizations have opted to store database files in a vault and limit employee access to the actual computer system using security devices that verify personal identity.
Data Recovery
As with almost all complex forms of computer hardware and software, there is always the possibility of failure. Therefore, it becomes crucial for data administrators to have system recovery features in place to be able to recover database contents that are damaged or lost when problems occur. Performing an actual recovery can be a difficult task. In all likelihood, it may not be possible to completely and seamlessly restore data once an interruption has taken place. The volatility of computer memory and timing and complexity of computer processing inhibit the ability of data administrators to re-create data accurately.
Mirroring involves making frequent simultaneous copies of a database to ensure that two or more copies are maintained in different locations at all times. This approach is expensive, but is most suitable when rapid recovery is needed. Travel agencies and airline reservation system databases often rely on this method of recovery Reprocessing involves going back to a known point of database activity before the problem occurred and reprocessing work from that point forward. However, periodic database saves must be performed so that the data administrator can go back and begin again at a clean starting point. In addition, records must be kept of all transactions made since the last save. These records are used to reenter and reprocess transactions that were lost as a result of the failure. This strategy is more time-consuming than mirroring, and all other transactions are delayed until the recovery process is complete.
Solutions Scenario
At all times, there are threats to data security, especially when a database fails. These threats include accidental losses attributable to human error, software failure, hardware failure, theft and
fraud, improper data access, loss of privacy (personal data), loss of confidentiality (corporate data), loss of data integrity, loss of customers, loss of corporate integrity, loss of availability (through sabotage, for example), exposure through com links, aborted transactions, incorrect data, system failure (database intact), loss of transfers and backups, loss of money, loss of time, and database destruction. To prevent some of these issues, most corporations and companies using databases have backup and recovery systems. They include, but are not limited to, backup facilities, journalizing facilities, transaction logs (time, records, and input values), database change logs (before & after images), checkpoint facilities, recovery managers, and a restart point after a failure. I guess it is safe to say that, even in this age of technology, it does not hurt to back up data in paper format as well!
Compare data mining, data warehousing, and data marts. Describe the purpose and value of data mining. Describe the purpose and value of data warehousing. Describe the purpose and value of data marts.
By recording the activity of shoppers in an online store, such as Amazon.com, over time, retailers can use knowledge of these patterns to improve the placement of items in the layout of a mail-order catalog page or Web page. Telephone companies mine customer billing data to identify customers who spend considerably more than average on their monthly phone bill. The company can then target these customers to sell additional services. Marketers can effectively target the wants and needs of specific consumer groups by analyzing data about customer preferences and buying patterns. Hospitals use data mining to identify groups of people whose healthcare costs are likely to increase in the near future so that preventative steps can be taken.
Many organizations now use data warehouses to bring multiple databases together and make them available for data mining and other forms of analysis. A data warehouse is a collection of data, usually current and historical, from multiple databases that the organization can use for analysis and decision making. The purpose, of course, is to bring key sets of data about or used by the organization into one place. Bringing together so much data into a data warehouse makes analysis very difficult. To address this problem, organizations use what are called data marts. Data marts are related sets of data that are grouped together and separated out from the main body of data in the data warehouse. Data marts are designed to be made available to specific sets of users. For example, data about manufacturing can be put into a data mart and be made available to the production department. Human resource data can be put into another data mart and be provided to the human resources employees. This approach makes it easier for each group or constituency in the organization to access the data they need.
*Data marts are related sets of data that are grouped together and separated out from the main body of data.
Break down the database development process into its key steps. Describe the tasks associated with each of the key steps.
resource management, public relations, research and development, and so on. Taken together, these process maps represent an enterprise-wide model of the organization and its core processes.
*The first step in the basic process used to develop a database is to identify or define your business processes.