Logo Bases de données
Français     English

Applications of Databases to Humanities and Social Sciences

Databases design : stage 2

Structuring of data

The information contained in the data dictionary must be structured in homogeneous "sets" (entities, tables) in which the repetition (redundancy) of information must be minimal.
The attributes which are calculated (which have a calculation rule) should not be taken into account when structuring data. They are, indeed, a combination of other attributes, and including them in the phase of structuring would introduce a form of redundancy and thus possibilities of inconsistency. In our example, the attribute "Moyenne" belongs to this category.

We will say that a set of attributes is homogeneous if all the attributes which composes it have a "direct" link.
The redundancy inside a set must be minimal because redundancy implies inconsistency. Indeed, if the same information is repeated several times, that means :

  • That it will have to be entered several times, which multiplies the risks of typing errors.
  • That when it must be updated, the modifications will have to intervene in several places that implies risks of partial update.

To gather the attributes of the data dictionary we use the functional dependency ( FD) as structuring element .

For a more formal justification of the structuring process presented hereafter, refer, in these pages, to Normalization.

Concepts of functional dependency ( FD)

We say that there is a functional dependency between an attribute A1 and an attribute A2, we note A1 - > A2, if knowing a value of A1 we can associate only one value of A2. We say also that A1 determines A2. A1 is the source of the functional dependency and A2 the goal.

Example of FD

In our database example, knowing the student's number, we know, in a single way, the name of this student and, among others, his date of birth. We thus have :

Numetu - > Nometu, Dtnaiss.

On the other hand, the expression : Numetu - > Note is false.

Knowing a student number we can know the marks he obtained in each subject. The knowledge of student's number thus does not make it possible to know of a particular mark and there is thus no FD.

Simple FD:
It will be said that a FD is simple if its source is made up of only one attribute.
A simple FD characterizes an entity (table) whose source is the key and whose properties are the goal of the FD.

Compound FD:

It will be said that a FD is compound if its source is composed by the collection of several attributes.
A compound FD characterizes an association between entities (tables) whose source is the key and whose properties are the goal of the FD. There must be no extraneous attributes in the source of a compound FD. For example if we have the two compound FDs:

1- A1, A2, A3 -> A4
2- A1, A2 -> A4

The FD number 1 is incorrect since A3 is superfluous. The "correct" compound FD is : A1, A2 -> A4

To represent the entities and associations we will use the following notation:

NAME_OF_ENTITY (Source, Goal). The key is underlined.

For example:

A1 - > A2 , A3 , A4
ENTITY1 (A1 , A2 , A3, A4)
A3 - > A4
ENTITY2 (A3 , A4)
A1 , A5 - > A6
ASSOC1 (A1 , A5 , A6)




Deletion of redundancy

The use of FDs allows to structure the information contained in the data dictionary but does not eliminate, directly, the possible redundancy of this information.
For example, in the table above, the attributes A3 and A4 are redundant since they appear in two entities : ENTITY1 and ENTITY2. It is thus necessary to eliminate this redundancy without however losing information : i.e removing entities.

In this example, we cannot remove A3 of entity2 because that would imply the deletion of the entity and thus a loss of information. A4 cannot be neither removed from entity2 since it is the only property of this entity. On the other hand A4 can be removed from entity1, without loss of information, since it can be obtained (deduced) by means of A3, using the transitivity of the FD.

The "Optimal" structure (maximum consistency and minimal redundancy) of the previous FD schema is thus :

A1 - > A2 , A3
A3 - > A4
A1 , A5 - > A6

So we have the rule :

To remove the redundancies on a set of FDs, it is necessary to eliminate the attributes which can be obtained using FDs transitivity.

Method for data structuring

The data structuring is realized in 5 stages:

  1. Determination of the list of the simple FDs
  2. Taken into account the attributes not classified in the stage 1 and determination of the compound FDs
  3. Elimination of the transitivities of the FDs schema
  4. Construction, starting from the simple FDs, of the database entities
  5. Construction, starting from the compound FDs, of the database associations

If we apply this method to our example we have the following results:

1-Simple FDs of the STUDENTS database

Numetu - > Nometu, Dtnaiss, Cdsexe, Lbsexe
Cdsexe - > Lbsexe
Numens - > Nomens, Grade, Ancien
Numat - > Nomat, Coeff, Numens, Nomens, Grade, Ancien

2-Attributs not classified and compound FDs of the STUDENTS database

The attribute "notes" was not classified in the preceding stage, it thus intervenes in a coumpound FD. This DF is the following one:

Numetu, Numat - > Note

3-Elimination of transitivities on the FDs schema

Numetu - > Nometu, Dtnaiss, Cdsexe
Cdsexe - > Lbsexe
Numens - > Nomens, Grade, Ancien
Numat - > Nomat, Coeff, Numens

4-Construction of entities

ETUDIANT (Numetu, Nometu, Dtnaiss, Cdsexe)
SEXE (Cdsexe, Lbsexe)
ENSEIGNANT (Numens, Nomens, Grade, Ancien)
MATIERE (Numat, Nomat, Coeff, Numens)

5-Construction of associations

NOTE (Numetu, Numat, Note)

Previous stage | Next Stage | DB Design

© Marc Grange, February 2001 | Last update: February 5, 2013