Coefficient of Variation Function in PostgreSQL
By Susam Pal on 13 May 2010
Today I learnt how to create an aggregate function in PostgreSQL
function that wraps the functionality provided by other aggregate
functions in PosgreSQL. In this experiment, I created
a cv()
function that calculates the coefficient of
variation. The function cv(x)
is equivalent
to stddev(x)
/ avg(x)
where x
represents the list of data points.
Example Data Table
$ cat perf.sql CREATE TABLE performance ( name VARCHAR, duration DOUBLE PRECISION ); INSERT INTO performance VALUES ('RAND', 101.0); INSERT INTO performance VALUES ('ZERO', 157.0); INSERT INTO performance VALUES ('NONE', 209.0); INSERT INTO performance VALUES ('TEST', 176.0); INSERT INTO performance VALUES ('UNIT', 197.0); INSERT INTO performance VALUES ('LOAD', 193.0); INSERT INTO performance VALUES ('FREE', 198.0); $ psql statistics psql (8.4.3) Type "help" for help. statistics=# \i perf.sql DROP TABLE CREATE TABLE INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 statistics=# select * from performance; name | duration ------+---------- RAND | 101 ZERO | 157 NONE | 209 TEST | 176 UNIT | 197 LOAD | 193 FREE | 198 (7 rows) statistics=#
Useful Details to Create Our Function
statistics=# SELECT aggtransfn, aggfinalfn, aggtranstype::regtype, agginitval statistics-# FROM pg_aggregate statistics-# WHERE aggfnoid='stddev(double precision)'::regprocedure; aggtransfn | aggfinalfn | aggtranstype | agginitval --------------+--------------------+--------------------+------------ float8_accum | float8_stddev_samp | double precision[] | {0,0,0} (1 row) statistics=# SELECT aggtransfn, aggfinalfn, aggtranstype::regtype, agginitval statistics-# FROM pg_aggregate statistics-# WHERE aggfnoid='avg(double precision)'::regprocedure; aggtransfn | aggfinalfn | aggtranstype | agginitval --------------+------------+--------------------+------------ float8_accum | float8_avg | double precision[] | {0,0,0} (1 row) statistics=#
Function Definition
$ cat cv.sql CREATE OR REPLACE FUNCTION finalcv(double precision[]) RETURNS double precision AS $$ SELECT float8_stddev_samp($1) / float8_avg($1); $$ LANGUAGE SQL; CREATE AGGREGATE cv(double precision) ( sfunc = float8_accum, stype = double precision[], finalfunc = finalcv, initcond = '{0, 0, 0}' );
Usage
$ psql statistics psql (8.4.3) Type "help" for help. statistics=# select stddev(duration), avg(duration) from performance; stddev | avg ------------------+------------------ 37.1682147873178 | 175.857142857143 (1 row) statistics=# select stddev(duration) / avg(duration) as cv from performance; cv ------------------- 0.211354592616754 (1 row) statistics=# \i cv.sql CREATE FUNCTION CREATE AGGREGATE statistics=# select cv(duration) from performance; cv ------------------- 0.211354592616754 (1 row) statistics=#
Bessel's Correction
Checked whether
Bessel's
correction was used in the stddev()
function of
PostgreSQL. Yes, it was used.
$ octave -q octave:1> std([101, 157, 209, 176, 197, 193, 198], 0) ans = 37.168 octave:2> std([101, 157, 209, 176, 197, 193, 198], 1) ans = 34.411 octave:3>
The std()
function in MATLAB and GNU Octave applies
Bessel's correction when invoked with the second argument
as 0
.