본문 바로가기
postgresql

postgresql에서 percentile_cont(), percentile_disc()

by 루에 2025. 1. 19.
반응형

PostgreSQL에서는 percentile_cont()와 percentile_disc() 함수를 모두 사용할 수 있습니다. 이 두 함수는 PostgreSQL 9.4 버전부터 도입되었으며, 현재까지 지원되고 있습니다.

## percentile_cont()

percentile_cont()는 연속 분포를 가정하고 백분위수를 계산하는 함수입니다.

- 특징: 
  - 지정된 백분위수에 해당하는 값을 보간하여 반환합니다.
  - 결과값이 입력 데이터셋에 실제로 존재하지 않을 수 있습니다.

- 사용 예:
```sql
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY column_name)
FROM table_name;
```

## percentile_disc()

percentile_disc()는 이산 분포를 가정하고 백분위수를 계산하는 함수입니다.

- 특징:
  - 지정된 백분위수보다 크거나 같은 첫 번째 실제 값을 반환합니다.
  - 결과값은 항상 입력 데이터셋에 존재하는 값입니다.

- 사용 예:
```sql
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY column_name)
FROM table_name;
```

## 다중 백분위수 계산

두 함수 모두 배열을 사용하여 여러 백분위수를 한 번에 계산할 수 있습니다:

```sql
SELECT 
  percentile_cont(ARRAY[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY column_name) AS cont_percentiles,
  percentile_disc(ARRAY[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY column_name) AS disc_percentiles
FROM table_name;
```

## 대체 방법

만약 어떤 이유로 이 함수들을 사용할 수 없는 경우, 다음과 같은 대체 방법을 고려할 수 있습니다:

1. 정렬 후 ROW_NUMBER() 사용:
```sql
WITH ranked AS (
  SELECT column_name, ROW_NUMBER() OVER (ORDER BY column_name) AS rn,
         COUNT(*) OVER () AS total_count
  FROM table_name
)
SELECT AVG(column_name) AS median
FROM ranked
WHERE rn IN (total_count/2, (total_count+1)/2);
```

2. ntile() 함수 사용:
```sql
SELECT MAX(column_name) AS percentile_75
FROM (
  SELECT column_name, NTILE(4) OVER (ORDER BY column_name) AS quartile
  FROM table_name
) subquery
WHERE quartile = 3;
```

3. 근사값 계산을 위한 히스토그램 사용:
PostgreSQL의 width_bucket() 함수를 사용하여 히스토그램을 만들고 근사 백분위수를 계산할 수 있습니다.

이러한 대체 방법들은 정확한 백분위수 값을 제공하지 않을 수 있으며, 대규모 데이터셋에서는 성능 문제가 발생할 수 있습니다. 따라서 가능하다면 percentile_cont()와 percentile_disc() 함수를 사용하는 것이 좋습니다[1][2][3].

출처
[1] How to Calculate Percentiles in PostgreSQL - CommandPrompt Inc. https://www.commandprompt.com/education/how-to-calculate-percentiles-in-postgresql/
[2] PERCENTILE_DISC() in PostgreSQL as a window function https://stackoverflow.com/questions/39595038/percentile-disc-in-postgresql-as-a-window-function
[3] Calculating Percentile (and Median) in PostgreSQL - leafo.net https://leafo.net/guides/postgresql-calculating-percentile.html
[4] Understanding percentile_cont() and percentile_disc() in PostgreSQL https://www.timescale.com/learn/understanding-percentile_cont-and-percentile_disc
[5] Within-group ordered-set aggregate functions - Yugabyte Docs https://docs.yugabyte.com/preview/api/ysql/exprs/aggregate_functions/function-syntax-semantics/mode-percentile-disc-percentile-cont/
[6] PERCENTILE_CONT 창 함수 - Amazon Redshift https://docs.aws.amazon.com/ko_kr/redshift/latest/dg/r_WF_PERCENTILE_CONT.html
[7] How to calculate percentiles with SQL PERCENTILE_CONT https://vladmihalcea.com/sql-percentile-cont/
[8] PERCENTILE_DISC 창 함수 - Amazon Redshift https://docs.aws.amazon.com/ko_kr/redshift/latest/dg/r_WF_PERCENTILE_DISC.html
[9] Calculating medians in PostgreSQL with percentile_cont - SkillsLogic https://www.skillslogic.com/blog/dashboards-data-warehousing/calculating-medians-in-postgresql-with-percentile_cont
[10] Percentile_disc() function does work with QRDS https://discuss.redash.io/t/percentile-disc-function-does-work-with-qrds/6342
[11] Documentation: 9.4: Aggregate Functions - PostgreSQL https://www.postgresql.org/docs/9.4/functions-aggregate.html
[12] Issue with using percentile_cont function in Postgresql https://stackoverflow.com/questions/42106362/issue-with-using-percentile-cont-function-in-postgresql
[13] PERCENTILE_CONT 함수 - Amazon Redshift https://docs.aws.amazon.com/ko_kr/redshift/latest/dg/r_PERCENTILE_CONT.html
[14] How to workaround unsupported percentile_cont in Postgres/Citus? https://stackoverflow.com/questions/54963993/how-to-workaround-unsupported-percentile-cont-in-postgres-citus/55119435
[15] Support for postgres percentile_disc() and percentile_cont ... - GitHub https://github.com/questdb/questdb/issues/1371

반응형

댓글