|
Databases design : stage 2
Structuring of data
The information contained in the data dictionary
must be structured in homogeneous "sets" (entities, tables) in which
the repetition (redundancy) of information must be minimal.
The attributes which are calculated (which have a calculation rule)
should not be taken into account when structuring data. They are,
indeed, a combination of other attributes, and including them in
the phase of structuring would introduce a form of redundancy and
thus possibilities of inconsistency. In our example, the attribute
"Moyenne" belongs to this category.
We will say that a set of attributes is homogeneous
if all the attributes which composes it have a "direct"
link.
The redundancy inside a set must be minimal because redundancy implies
inconsistency. Indeed, if the same information is repeated several
times, that means :
-
That it will have to be keybord several times,
which multiplies the risks of typing errors.
-
That when it must be updated, the modifications
will have to intervene in several places that implie risks of
partial update.
To gather the attributes of the data dictionary we
use the functional dependency ( FD) as structuring element .
------------------------------------
For a more formal justification of the structuring process presented
hereafter, refer, in these pages, to Normalization.
------------------------------------
Concepts of functional
dependency ( FD)
We say that there is a functional dependency between
an attribute A1 and an attribute A2, we note A1 - > A2, if knowing
a value of A1 we can associate only one value of A2. We say also
that A1 determines A2. A1 is the source of the functional dependency
and A2 the goal.
Example of FD
In our database example, knowing the student's number,
one knows, in a single way, the name of this student and, among
others, his date of birth. We thus have :
Numetu - > Nometu, Dtnaiss.
On the other hand, the expression : Numetu -
> Note is false.
Knowing a student number one can knows the marks
he obtained in each subject. The knowledge of student's number thus
does not make it possible to know of a particular mark and there
is thus no FD.
Simple FD:
It will be said that a FD is simple if its source is made up of
only one attribute.
A simple FD characterizes an entity (table) whose source
is the key and whose properties are the goal of the FD.
Compound FD:
It will be said that a FD is compound if its source is composed
by the collection of several attributes.
A compound FD characterizes an association between entities (tables)
whose source is the key and whose properties are the goal of the
FD. There must be no extraneous attributes in the source of a compound
FD. For example if we have the 2 compound FDs:
1- A1, A2, A3 -> A4
2- A1, A2 -> A4
The FD number 1 is incorrect since A3 is superfluous. The "correct"
compound FD is : A1, A2 -> A4
To represent the entities and associations we will
use the following notation :
NAME_OF_ENTITY (Source, Goal). The
key is underlined.
For example:
|
FD
|
Entities
|
|
A1 - > A2 , A3 , A4
|
ENTITY1 (A1 , A2 , A3, A4)
|
|
A3 - > A4
|
ENTITY2 (A3 , A4)
|
|
A1 , A5 - > A6
|
ASSOC1 (A1 , A5 , A6)
|
Deletion
of the redundancy
The use of FDs allows to structure the information
contained in the data dictionary but does not eliminate, directly,
the possible redundancy of this information.
For example, in the table above, the attributes A3 and A4 are redundant
since they appear in two entities : ENTITY1 and ENTITY2. It is thus
necessary to eliminate this redundancy without however losing
information : i.e removing entities.
In this example, one cannot remove A3 of entity2
because that would imply the deletion of the entity and thus a loss
of information. A4 cannot be neither removed from entity2 since
it is the only property of this entity. On the other hand A4 can
be removed from entity1, without loss of information, since it can
be obtained (deduced) by means of A3, using the transitivity of
the FD.
The "Optimal" structure (maximum consistency and
minimal redundancy) of the previous FD schema is thus :
A1 - > A2 , A3
A3 - > A4
A1 , A5 - > A6
So we have the rule :
| To remove the redundancies on a set of FDs,
it is necessary to eliminate the attributes which can be obtained
using FDs transitivity. |
Method for
structuring data
The structuring of data is realized in 5 stages:
- Determination of the list of the simple FDs
- Taken into account the attributes not classified in the stage
1 and determination of the compound FDs
- Elimination of the transitivities of the FDs schema
- Construction, starting from the simple FDs, of the database
entities
- Construction, starting from the compound FDs, of the database
associations
If we apply this method to our example we have the following
results:
1-Simple FDs of the STUDENTS database
Numetu - > Nometu, Dtnaiss, Cdsexe, Lbsexe
Cdsexe - > Lbsexe
Numens - > Nomens, Grade, Ancien
Numat - > Nomat, Coeff,
Numens, Nomens, Grade, Ancien
2-Attributs not classified
and compound FDs of the STUDENTS database
The attribute "notes" was not classified in the preceding
stage, it thus intervenes in a coumpound FD. This DF is the following
one:
Numetu, Numat
- > Note
3-Elimination of transitivities
on the FDs schema
Numetu - > Nometu, Dtnaiss, Cdsexe
Cdsexe - > Lbsexe
Numens - > Nomens,
Grade, Ancien
Numat - > Nomat, Coeff, Numens
4-Construction of entities
ETUDIANT (Numetu, Nometu, Dtnaiss,
Cdsexe)
SEXE (Cdsexe, Lbsexe)
ENSEIGNANT (Numens, Nomens, Grade, Ancien)
MATIERE (Numat, Nomat, Coeff, Numens)
5-Construction of associations
NOTE (Numetu, Numat, Note)
Previous stage
| Next Stage | DB
Design
|