|
Databases design : stage 2
Structuring of data
The information contained in the data
dictionary must be structured in homogeneous "sets" (entities, tables)
in which the repetition (redundancy) of information must be minimal.
The attributes which are calculated (which have a calculation rule)
should not be taken into account when structuring data. They are,
indeed, a combination of other attributes, and including them in the
phase of structuring would introduce a form of redundancy and thus
possibilities of inconsistency. In our example, the attribute "Moyenne"
belongs to this category.
We will say that a set of attributes is
homogeneous if all the attributes which composes it have a "direct"
link.
The redundancy inside a set must be minimal because redundancy implies
inconsistency. Indeed, if the same information is repeated several
times, that means :
-
That it will have to be entered
several times, which multiplies the risks of typing errors.
-
That when it must be updated, the
modifications will have to intervene in several places that implie
risks of partial update.
To gather the attributes of the data
dictionary we use the functional dependency ( FD) as structuring
element .
------------------------------------
For a more formal justification of the structuring process presented
hereafter, refer, in these pages, to Normalization.
------------------------------------
Concepts
of functional dependency ( FD)
We say that there is a functional
dependency between an attribute A1 and an attribute A2, we note A1 -
> A2, if knowing a value of A1 we can associate only one value of
A2. We say also that A1 determines A2. A1 is the source of the
functional dependency and A2 the goal.
Example of FD
In our database example, knowing the
student's number, we know, in a single way, the name of this student
and, among others, his date of birth. We thus have :
Numetu - > Nometu, Dtnaiss.
On the other hand, the expression : Numetu
- > Note is false.
Knowing a student number we can know the marks
he obtained in each subject. The knowledge of student's number thus
does not make it possible to know of a particular mark and there is
thus no FD.
Simple FD:
It will be said that a FD is simple if its source is made up of only
one attribute.
A simple FD characterizes an entity (table) whose source is the
key and whose properties are the goal of the FD.
Compound FD:
It will be said that a FD is compound if its source is composed by the
collection of several attributes.
A compound FD characterizes an association between entities (tables)
whose source is the key and whose properties are the goal of the FD.
There must be no extraneous attributes in the source of a compound FD.
For example if we have the two compound FDs:
1- A1, A2, A3 -> A4
2- A1, A2 -> A4
The FD number 1 is incorrect since A3 is superfluous. The "correct"
compound FD is : A1, A2 -> A4
To represent the entities and associations
we will use the following notation:
NAME_OF_ENTITY (Source,
Goal). The key is underlined.
For example:
|
FD
|
Entities
|
|
A1 - > A2 , A3 , A4
|
ENTITY1 (A1 , A2 , A3, A4)
|
|
A3 - > A4
|
ENTITY2 (A3 , A4)
|
|
A1 , A5 - > A6
|
ASSOC1 (A1 , A5 , A6)
|
Deletion
of redundancy
The use of FDs allows to structure the
information contained in the data dictionary but does not eliminate,
directly, the possible redundancy of this information.
For example, in the table above, the attributes A3 and A4 are redundant
since they appear in two entities : ENTITY1 and ENTITY2. It is thus
necessary to eliminate this redundancy without however losing
information : i.e removing entities.
In this example, we cannot remove A3 of
entity2 because that would imply the deletion of the entity and thus a
loss of information. A4 cannot be neither removed from entity2 since it
is the only property of this entity. On the other hand A4 can be
removed from entity1, without loss of information, since it can be
obtained (deduced) by means of A3, using the transitivity of the FD.
The "Optimal" structure (maximum
consistency and minimal redundancy) of the previous FD schema is thus :
A1 - > A2 , A3
A3 - > A4
A1 , A5 - > A6
So we have the rule :
| To remove the redundancies on a
set of FDs, it is necessary to eliminate the attributes which can be
obtained using FDs transitivity. |
Method
for data structuring
The data structuring is realized in 5
stages:
- Determination of the list of the simple FDs
- Taken into account the attributes not classified in
the stage 1 and determination of the compound FDs
- Elimination of the transitivities of the FDs schema
- Construction, starting from the simple FDs, of the
database entities
- Construction, starting from the compound FDs, of the
database associations
If we apply this method to our example we have the
following results:
1-Simple FDs of the STUDENTS
database
Numetu - > Nometu, Dtnaiss, Cdsexe,
Lbsexe
Cdsexe - > Lbsexe
Numens - > Nomens, Grade, Ancien
Numat - > Nomat, Coeff, Numens, Nomens, Grade, Ancien
2-Attributs not
classified and compound FDs of the STUDENTS database
The attribute "notes" was not classified
in the preceding stage, it thus intervenes in a coumpound FD. This DF
is the following one:
Numetu, Numat - > Note
3-Elimination of
transitivities on the FDs schema
Numetu - > Nometu, Dtnaiss, Cdsexe
Cdsexe - > Lbsexe
Numens - > Nomens, Grade, Ancien
Numat - > Nomat, Coeff, Numens
4-Construction
of entities
ETUDIANT (Numetu, Nometu,
Dtnaiss, Cdsexe)
SEXE (Cdsexe, Lbsexe)
ENSEIGNANT (Numens, Nomens, Grade, Ancien)
MATIERE (Numat, Nomat, Coeff, Numens)
5-Construction
of associations
NOTE (Numetu, Numat, Note)
Previous
stage | Next Stage | DB Design
|