Logo Bases de données
Français     English

Applications of Databases to Humanities and Social Sciences

Introduction


SQL: Groups of rows


GROUP BY clause

This clause allows to create subsets (groups of rows) for which the value of one (or several) column is identical. This clause is related to the use of statistical functions which will be applied to the subsets defined by the GROUP BY clause.

-Group by rank of the teachers

SELECT Grade
FROM ENSEIGNANT
GROUP BY Grade;

This query will create as many subsets as possible values of the Grade attribute. Here 3 subsets : ASS, MCF, PR.

HAVING clause

This clause, contrary to the clause WHERE which specifies the conditions to apply to the rows of a table, allows to specify conditions at the subsets level created by GROUP BY.

-Subset by rank of the teachers of type 'ASS'

SELECT Grade
FROM ENSEIGNANT
GROUP BY Grade
HAVING Grade='ASS';

Statistical functions

Functions
Meaning
COUNT ([DISTINCT] expr [alias]
Return the number of values in the column. If DISTINCT is specified, redundant values are deleted.
COUNT (*)
Count all the rows of the table.
AVG (expr [alias])
Return the average value of the specified column or expression.
MIN (expr [alias])
Return the smallest value of the specified column.
MAX (expr [alias])
Return the highest value of the specified column.
SUM (expr [alias])
Return the sum of the values of the specified column.

-Display the date of birth of the youngest student and the oldest student.

SELECT MIN (Dtnaiss) AS "Younger", MAX (Dtnaiss) AS "Older"
FROM ETUDIANT;

-Display the total number of students

SELECT COUNT(*)
FROM ETUDIANT;

Test queries examples

SQL summary

Statistical functions and GROUP BY clause

The statistical functions operate on a data set and return a single value. Generally, when an attribute is selected in a SELECT clause, the result can contain from zero to n values what can cause conflicts when one uses aggregate functions which return only a single value.

The following SELECT order, supposed to calculate the average mark by student, is thus incorrect and will generate the characteristic error of execution : "Invalid column reference"

SELECT Numetu, AVG(note)
FROM NOTES;

The correct formulation is :

SELECT Numetu, AVG(note)
FROM NOTES
GROUP BY Numetu;

In this case, a subset will be created for each student number and the average will be calculated at the subset level.

Test queries examples

SQL summary

Q07: Display for each student, identified by his name, the total number of marks which it obtained.

SQL summary

Q08: Display the name of the students who have less than 4 marks.

SQL summary

Q09: Display the name of the first and the last student in alphabetical order.

SQL summary

Q10: Display for each subject number the number of students who have a mark.

 

© Marc Grange, February 2001 | Last update: January 29, 2009