DATABASES AND DATABASE USERS

DATABASES AND DATABASE USERS

#1.9 – What is the difference between controlled and uncontrolled redundancy?
#1.10 – Specify all the relationships among the records of the database shown in Figure 1.2.
#1.12 – Cite some examples of integrity constraints that you think can apply to the database shown in Figure 1.2.
#1.13 – Give examples of systems in which it may make sense to use traditional file processing instead of a database approach.
#1.14 – Consider Figure 1.2.
a. If the name of the ‘CS’ (Computer Science) Department changes to ‘CSSE’ (Computer Science and Software Engineering) Department and the corresponding prefix for the course number also changes, identify the columns in the database that would need to be updated.
b. Can you restructure the columns in COURSE, SECTION, and PREREQUISITE tables so that only one column will need to be updated?
CH 2: DATABASE SYSTEM CONCEPTS AND ARCHITECTURE
#2.14 – if you were designing a Web-based system to make airline reservations and to sell airline tickets, which DBMS Architecture would you choose from Section 2.5? Why? Why would the other architectures not be a good choice?
#2.15 – Consider Figure 2.1. In addition to constraints relating the values of columns in one table to columns in another table, there are also constraints that impose restrictions on values in a column or a combination of columns within a table. One such constraint forces that a column or a group of columns must be unique across all rows in the table. For example, in the STUDENT table, the StudentNumber column must be unique (to prevent two different students from having the same StudentNumber). Identify the column or the group of columns in the other tables that must be unique across all rows in the table?
CH 3: THE RELATIONAL DATA MODEL AND RELATIONAL DATABASE CONSTRAINTS
#3.13 – Consider the relation CLASS(Course#, Univ_Section#, InstructorName, Semester, BuildingCode, Room#, TimePeriod, Weekdays, CreditHours). This represents classes taught in a university with unique Univ_Section#. Give what you think should be various candidate keys and
1. {Semester, BuildingCode, Room#, TimePeriod, Weekdays} if the same room cannot be used at the same time by more than one course during a particular semester.
2. {Univ_Section#} if it is unique across all semesters.
3. {InstructorName, Semester} if an instructor can teach at most one course during each semester.
4. If Univ_Section# is not unique, which is the case in many universities, we have to examine the rules that the university uses for section numbering. For example, if the sections of a particular course during a particular semester are numbered 1, 2, 3, …, then a candidate key would be {Course#, Univ_Section#, Semester}. If, on the other hand, all sections (of any course) have unique numbers during a particular semester only, then the candidate key would be {Univ_Section#, Semester}.
#3.16 – Consider the following relations for a database that keeps track of student enrollment in courses and the books adopted for each course:
STUDENT (SSN, Name, Major, Bdate)
COURSE (Course#, Quarter, Grade)
ENROLL (SSN, Course#, Quarter, Grade)
BOOK_ADOPTION (Course#, Quarter, Book_ISBN)
TEXT (Book_ISBN, Book_Title, Publisher, Author)
Specify the foreign keys for this schema, stating any assumptions you make.
The schema of this question has the following four foreign keys:
1. The attribute SSN of relation ENROLL that references relation STUDENT,
2. The attribute Course# in relation ENROLL that references relation COURSE,
3. The attribute Course# in relation BOOK_ADOPTION that references relation COURSE, and
4. The attribute Book_ISBN of relation BOOK_ADOPTION that references relation TEXT.
#3.19 – Consider a STUDENT relation in a UNIVERSITY database with the following attributes (Name, SSN, Local_phone, Address, Cell_phone, Age, GPA). Note that the cell phone may be from a different city and state (or province) from the local phone. A possible tuple of the relation is shown below:
Name SSN LocalPhone Address CellPhone Age GPA
George Shaw William Edwards 123-45-6789 555-1234 123 Main St., Anytown, CA 94539 555-4321 19 3.75

a. Identify the critical missing information from the LocalPhone and CellPhone attributes as shown in the example above. (Hint: How do call someone who lives in a different state or province?)
b. Would you store this additional information in the LocalPhone and CellPhone attributes or add new attributes to the schema for STUDENT?
c. Consider the Name attribute. What are the advantages and disadvantages of splitting this field from one attribute into three attributes (first name, middle name, and last name)?
d. What general guideline would you recommend for deciding when to store information in a single attribute and when to split the information.
a. A combination of first name, last name, and home phone may address the issue assuming that there are no two students with identical names sharing a home phone line. It also assumes that every student has a home phone number. Another solution may be to use first name, last name, and home zip code. This again has a potential for duplicates, which would be very rare within one university. An extreme solution is to use a combination of characters from last name, major, house number etc.

b. If we use name in a primary key and the name changes then the primary key changes. Changing the primary key is acceptable but can be inefficient as any references to this key in the database need to be appropriately updated, and that can take a long time in a large database. Also, the new primary key must remain unique. [Footnote: Name change is an example of where our database must be able to model the natural world. In this case, we recognize that the name change can occur regardless of whether it is due to marriage, or a consequence of a religious and/or spiritual conversion, or for any other reason.]

c. The challenge of choosing an invariant primary key from the natural data items leads to the concept of generated keys, also known as surrogate keys. Specifically, we can use surrogate keys instead of keys that occur naturally in the database. Some database professionals believe that it is best to use keys that are uniquely generated by the database, for example each row may have a primary key that is generated in the sequence of creation of rows (tuples). There are many advantages and disadvantages that are often been argued in design sessions. The main advantage is that it gives us an invariant key without any worries about choosing a unique primary key. The main disadvantages of surrogate keys are that they do not have a business meaning (making some aspects of database management challenging) and that they are slightly less efficient (because they require another pass when inserting a row because the key often needs to be returned to the application after a row is inserted).
#3.20 – Recent changes in privacy laws have disallowed organizations from using SSN to identify individuals unless certain restrictions are satisfied. As a result, most US universities cannot use SSNs as primary keys (except for financial data). In practice, StudentID, a unique ID, a unique identifier, assigned to every student, is likely to be used as the primary key rather than SSN since StudentID is usable across all aspects of the system.

a. By keeping the name attributes separated, we allow the possibility of looking these pieces of their name. In a practical use, it is not likely that the user will know the correct primary key for a given student and so we must consider how a user will locate the correct row without this information. If we were to collapse the name into a single attribute, then we have complicated any sort of “lookup by name” query; such a query would then require partial string matching and any results might not disambiguate between FirstName and LastName. Therefore, a practical system should allow name searches by FirstName and LastName; we must leave MiddleInitial separated still to avoid ambiguities from combining these pieces together.

b. A single attribute Phone# would no longer suffice if a student were able to have multiple phone numbers. We could possibly have multiple rows for a single student to allow this to happen, but then we have violated key principles of database design (e.g. having redundant data). A better solution would be to include the additional attributes HomePhone, CellPhone, and OfficePhone and allow the possibility of these attributes to have no value. Again, this is not most desirable because most students will not have all three of these attributes, and we will have many valueless key/attribute pairs. An excellent solution would be add an additional relation Phone# (SSN, Type, Number) while removing PhoneNumber from the Student relationship. This new relationship would allow the one-to-many relationship from students to phone numbers without creating redundant data or wasting space on sparse, valueless attributes.
CH 4: Basic SQL
#4.12 – Specify the following queries in SQL on the database schema of Figure 1.2.
a) Retrieve the names of all senior students majoring in ‘COSC’ (computer science).
b) Retrieve the names of all courses taught by professor King in 85 and 86.
c) For each section taught by professor King, retrieve the course number, semester, year, and number of students who took the section.
d) Retrieve the name and transcript of each senior student (Class=5) majoring in COSC. Transcript includes course name, course number, credit hours, semester, year, and grade for each course completed by the student.
e) Retrieve the names and major departments of all straight A students (students who have a grade of A in all their courses).
f) Retrieve the names and major departments of all students who do not have any grade of A in any of their courses.

(a) SELECT Name
FROM STUDENT
WHERE Major=’COSC’

(b) SELECT CourseName
FROM COURSE, SECTION
WHERE COURSE.CourseNumber=SECTION.CourseNumber AND Instructor=’King’
AND (Year=’85’ OR Year=’86’)
Another possible SQL query uses nesting as follows:
SELECT CourseName
FROM COURSE
WHERE CourseNumber IN ( SELECT CourseNumber
FROM SECTION
WHERE Instructor=’King’ AND (Year=’85’ OR Year=’86’) )

(c) SELECT CourseNumber, Semester, Year, COUNT(*)
FROM SECTION, GRADE_REPORT
WHERE Instructor=’King’ AND SECTION.SectionIdentifier=GRADE_REPORT.SectionIdentifier
GROUP BY CourseNumber, Semester, Year

(d) SELECT Name, CourseName, C.CourseNumber, CreditHours, Semester, Year, Grade
FROM STUDENT ST, COURSE C, SECTION S, GRADE_REPORT G
WHERE AND Major=’COSC’ AND ST.StudentNumber=G.StudentNumber AND
G.SectionIdentifier=S.SectionIdentifier AND S.CourseNumber=C.CourseNumber

(e) SELECT Name, Major
FROM STUDENT
WHERE NOT EXISTS ( SELECT *
FROM GRADE_REPORT
WHERE StudentNumber= STUDENT.StudentNumber AND NOT(Grade=’A’))

(f) SELECT Name, Major
FROM STUDENT
WHERE NOT EXISTS ( SELECT *
FROM GRADE_REPORT
WHERE StudentNumber= STUDENT.StudentNumber AND Grade=’A’ )
#4.15 – Consider the EMPLOYEE table’s constraint EMPSUPERFK as specified in Figure 4.2 is changed to read as follows:
CONSTRAINT EMPSUPERFK
FOREIGN KEY (SUPERSSN) REFERNCES EMPLOYEE(SSN)
ON DELETE CASCADE ON UPDATE CASCADE,
Answer the following questions:
a. What happens when the following command is run on the database state shown in Figure 5.6?
DELETE EMPLOYEE WHERE LNAME = ‘Borg’
b. Is it better to CASCADE or SET NULL in case of EMPSUPERFK constraint ON DELETE?

a) The James E. Borg entry is deleted from the table and each employee with him as a supervisor is also (and their supervisees, and so on). In total, 8 rows are deleted and the table is empty.
b) It is better to SET NULL, since an employee is not fired (DELETED) when their supervisor is deleted. Instead, their SUPERSSN should be SET NULL so that they can later get a new supervisor.
CH 8: THE ENHANCED ENTITY-RELATIONSHIP (EER) MODEL
#8.17 – Consider the BANK ER schema of Figure 7.21, and suppose that it is necessary to keep track of different types of ACCOUNTS (SAVINGS_ACCTS, CHECKING_ACCTS, …) and LOANS (CAR_LOANS, HOME_LOANS, …). Suppose that it is also desirable to keep track of each account’s TRANSACTIONs (deposits, withdrawals, checks, …) and each loan’s PAYMENTs; both of these include the amount, date, time, … Modify the BANK schema, using ER and EER concepts of specialization and generalization. State any assumptions you make about the additional requirements. (You may use Visio)

 

#8.26 – Which of the following EER diagram(s) is/are incorrect and why? State clearly any assumptions you make. (refer to the diagrams in your text – they are not too clear here).

Only (c) is incorrect.
CH 9: RELATIONAL DATABASE DESIGN BY ER- AND EER-TO-RELATIONAL MAPPING
#9.3 – Try to map the relational schema of Figure 6.14 into an ER schema. This is part of a process known as reverse engineering, where a conceptual schema is created for an existing implemented database. State any assumptions you make.

 

#9.4 – Figure 9.8 shows an ER schema for a database that may be used to keep track of transport ships and their locations for maritime authorities. Map this schema into a relational schema, and specify all primary keys and foreign keys.
Answer:
SHIP
SNAME OWNER TYPE PNAME
SHIP_TYPE
TYPE TONNAGE HULL
STATE_COUNTRY
NAME CONTINENT
SEAOCEANLAKE
NAME
SHIP_MOVEMENT
SSNAME DATE TIME LONGITUDE LATITUTE
PORT
S_C_NAME PNAME S_O_L_NAME
VISIT
VSNAME VPNAME STARTDATE ENDDATE

CH 10: PRACTICAL DATABASE DESIGN METHODOLOGY AND USE OF UML DIAGRAMS
#10.22 – What are the current relational DBMSs that dominate the market? Pick one that you are familiar with and show how it measures up based on the criteria laid out in Section 10.2.3?
Current popular relational DBMSs (RDBMSs) include DB2 and Informix Dynamic Server (from IBM), Oracle and Rdb (from Oracle), and SQL Server and Access (from Microsoft). Some other commercial relational DBMSs used are Sybase (also known as Sybase SQL Server), Paradox, and Foxbase. Recently, some open source relational DBMSs, such as MySQL, PostGreSQL, and FireBird, have also grown in popularity. At the time this solution was written, the market share varied depending on platform; with DB2, Oracle, and Microsoft SQL Server dominating the market overall. Specific criteria evaluation would depend on the database chosen.

10.23 – A possible DDL corresponding to Figure 3.1 is shown below:
CREATE TABLE STUDENT (
NAME VARCHAR(30) NOT NULL,
SSN CHAR(9) PRIMARY KEY,
HOMEPHONE VARCHAR(14),
ADDRESS VARCHAR(40),
OFFICEPHONE VARCHAR(14),
AGE INT,
GPA DECIMAL(4,3)
);

Discuss the following detailed design decisions:
a. The choice of requiring NAME to be NON NULL.
b. Selection of SSN as the PRIMARY KEY.
c. Choice of field sizes and precision.
d. Any modification of the fields defined in this database.
e. Any constraints on individual fields.

Answer:
a) This means that no Students can exist without a Name.

b) This has a possible problem because there are some restrictions on using SSN as an identifier for people, hence why most schools assign a student ID. It also does not allow (easily) for a change to a student’s SSN if this key is used elsewhere in the DB.

c) Giving Name a size of 30 does not allow for long names (think foreign students), especially since it contains a student’s entire name (first, middle, last). Homephone and Officephone, with size 14, might not allow for overseas or foreign formatted phone numbers. Age is expected to be (roughly) between 1 and 100, so using an INT is much more space than needed. A TinyInt would be better, since it allows 0 to 255. The GPA field does not allow for GPA systems that go above 9 (such as the 10 point scale and the 12 point scale), though these are rare.

d) I would suggest breaking the Name field into three fields: FirstName, MiddleName, and LastName. It is more complicated and less specific to search for a student by name if the name data is in one field. See question 3.19.
The same reasoning can be applied to the Homephone and Officephone fields, breaking them into multiple fields such as AreaCode, PhoneNumber, Extension, etc.
Another possible modification is to increase the size of the Homephone, Officephone, and SSN fields so that formatting characters can be entered, such as SSN becoming a VARCHAR(11) to allow for the two dashes in XXX-YY-ZZZZ.
The table design allows for two phone numbers, Homephone and Officephone, either of which may be NULL. This means that some rows will have empty fields for those attributes. A solution is then to add a Phone# (SSN, Type, Number) table and remove the two phone number fields from the Students table. This allows for any student to have zero or more phone numbers of various types, which we can specify (Home, Office, Cell, etc.).
Consider not storing the AGE data and instead storing DateOfBirth. Typically only the year component of Age is stored which is less accurate than if DateOfBirth were stored and the age computed from it. That is, we can generate Age from DateOfBirth, but we cannot generate DateOfBirth from Age.

e) A suggested constraint is on the GPA field, to restrict its values from 0.000 to 4.000. This would, however, prevent the (rarely seen) 10 or 12 point grade scale. Another possible constraint is to limit Age to be from 0 to 100 (or even a lower maximum value).

#10.24 – What naming conventions can you develop to help identify foreign keys more efficiently?
The naming convention can be informally specified as FK followed by the referenced relationship or relation name, and optionally followed by the primary key of the referenced table. This foreign key naming convention is frequently used in large databases. Sometimes this produces very long column names but it pays off by facilitating readability for database professionals by using more descriptive names.

CH 17: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING
#17.28 – A file has r=20,000 STUDENT records of fixed-length. Each record has the following fields: NAME (30 bytes), SSN (9 bytes), ADDRESS (40 bytes), PHONE (9 bytes), BIRTHDATE (8 bytes), SEX (1 byte), MAJORDEPTCODE (4 bytes), MINORDEPTCODE (4 bytes), CLASSCODE (4 bytes, integer), and DEGREEPROGRAM (3 bytes). An additional byte is used as a deletion marker. The file is stored on the disk whose parameters are given in Exercise 17.27.
a. Calculate the record size R in bytes.
b. Calculate the blocking factor bfr and the number of file blocks b assuming an unspanned organization.
c. Calculate the average time it takes to find a record by doing a linear search on the file if (i) the file blocks are stored contiguously and double buffering is used, and (ii) the file blocks are not stored contiguously.
d. Assume the file is ordered by SSN; calculate the time it takes to search for a record given its SSN value by doing a binary search.

#17.31 – A PARTS file with Part# as hash key includes records with the following Part# values: 2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981, 9208. The file uses 8 buckets, numbered 0 to 7. Each bucket is one disk block and holds two records. Load these records into the file in the given order using the hash function h(K)=K mod 8. Calculate the average number of block accesses for a random retrieval on Part#.

#17.42 – Suppose that a file initially contains r=120,000 records of R=200 bytes each in an unsorted (heap) file. The block size B=2400 bytes, the average seek time s=16 ms, the average rotational latency rd=8.3 ms and the block transfer time btt=0.8 ms. Assume that 1 record is deleted for every 2 records added until the total number of active records is 240,000.
a. How many block transfers are needed to reorganize the file?
b. How long does it take to find a record right before reorganization?
c. How long does it take to find a record right after reorganization?

CH 18: INDEXING STRUCTURES FOR FILES
#18.19 – A PARTS file with Part# as key field includes records with the following Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. Suppose the search field values are inserted in the given order in a B + -tree of order p=4 and p leaf =3; show how the tree will expand and what the final tree looks like.
CH 19: ALGORITHMS FOR QUERY PROCESSING AND OPTIMIZATION
#19.15 – Develop cost functions for the PROJECT, UNION, INTERSECTION, SET DIFFERENCE, and CARTESIAN PRODUCT algorithms discussed in section 19.4.
#19.21 – Extend the sort-merge join algorithm to implement the LEFT OUTER JOIN operation.
PLACE THIS ORDER OR A SIMILAR ORDER WITH US TODAY AND GET AN AMAZING DISCOUNT 🙂