Categories
Database Development

Postgres query performance – hash aggregate costly operation

I’m running Postgres 9.6.6 and I have a relatively simple query, however, I’m finding that it is slow with the hash aggregation function which is the most costly operation:

    select
      program_schedule_source_count.full_name,
      program_schedule_source_count.country,
      sum(program_schedule_source_count.displays) as displays,
      program_schedule_source_count.source_id,
      program_schedule_source_count.source_region
    from program_schedule_source_count
    where program_schedule_source_count.original_title = 'How I Met Your Mother'
    and program_schedule_source_count.show_type in ('SM', 'SE')
    and program_schedule_source_count.start_date between '20200101' and '20200217'
    group by
      program_schedule_source_count.source_id,
      program_schedule_source_count.full_name,
      program_schedule_source_count.country,
      program_schedule_source_count.source_region;

Below is the query plan:

HashAggregate  (cost=1139676.26..1139725.25 rows=3919 width=46) (actual time=18769.670..18770.066 rows=736 loops=1)
  Group Key: source_id, full_name, country, source_region
  ->  Index Scan using title_date_show_type_country on program_schedule_source_count  (cost=0.70..1139186.45 rows=39185 width=46) (actual time=0.098..18733.005 rows=42654 loops=1)
        Index Cond: ((start_date >= '20200101'::bpchar) AND (start_date <= '20200217'::bpchar) AND ((original_title)::text = 'How I Met Your Mother'::text))
        Filter: (show_type = ANY ('{SM,SE}'::bpchar[]))
Planning time: 0.223 ms
Execution time: 18770.252 ms

The table has indexes on all of the fields in the where clause:

CREATE UNIQUE INDEX program_schedule_pkey ON public.program_schedule_source_count USING btree (source_id, start_date, program_id);
CREATE INDEX title_date_show_type_country ON public.program_schedule_source_count USING btree (start_date, original_title, release_year, show_type, country);

I have tried changing the order of the group by function but this did nothing to alter the performance.
I tried to disable hash aggregation to see if this would speed up the query but it still runs in roughly the same amount of time.
I assume adding an index to the group by fields would not have any benefit because I am not searching these fields.

I’ve seen that clustering could benefit but in the documentation it says you would cluster by an index does this mean I would have to create another index of the group by fields and then cluster the data by this?

Is there an alternative way I can write the query which could make it faster?

Thanks all for your help

Leave an answer

Your email address will not be published. Required fields are marked *