Categories
Database Development

How to prevent seek pagination from doing sequential scan over entire table

  • I have 3 tables and a materialized view

resource_categories contains all category names and metadata

create table if not exists resource_categories (
    category_id INT,
    title VARCHAR(255),
    content TEXT,
    icon VARCHAR(50)
);

resources contains all resources under each category

create table if not exists resources (
    resource_id INT,
    title VARCHAR(255),
    content TEXT,
    link VARCHAR(1000),
    category_id INT REFERENCES resource_categories(resource_id),
    icon VARCHAR(255),
    created_at DATE,
    updated_at DATE
);

resource_votes contains user liking a resource or disliking it

create table if not exists resource_votes (
    resource_id INT REFERENCES resources(resource_id),
    user_id INT,
    vote BOOLEAN
);

resource_votes_aggregate a materialized view with likes per resource_id

  • I want to execute the following queries
  • Find all resources sorted in descending order of likes (find most liked resources)
  • Find all resources sorted in descending alphabetical order of their titles
  • I want to use SEEK / KEYSET pagination for efficiency

Query to find resources in descending order of their likes page 1

SELECT
   r.resource_id,
   title,
   COALESCE(likes, 0) AS likes 
FROM
   resources r 
   LEFT JOIN
      resource_votes_aggregate a 
      ON r.resource_id = a.resource_id 
ORDER BY
   likes DESC,
   resource_id DESC LIMIT 5;

Execution Plan

QUERY PLAN
Limit (cost=74.50..74.52 rows=5 width=157) (actual time=0.058..0.060 rows=5 loops=1)
-> Sort (cost=74.50..76.80 rows=918 width=157) (actual time=0.058..0.058 rows=5 loops=1)
Sort Key: (COALESCE(a.likes, '0'::bigint)) DESC, r.resource_id DESC
Sort Method: top-N heapsort Memory: 25kB
-> Hash Right Join (cost=12.03..59.25 rows=918 width=157) (actual time=0.032..0.046 rows=50 loops=1)
Hash Cond: (a.resource_id = r.resource_id)
-> Seq Scan on resource_votes_aggregate a (cost=0.00..30.40 rows=2040 width=12) (actual time=0.001..0.004 rows=38 loops=1)
-> Hash (cost=10.90..10.90 rows=90 width=149) (actual time=0.021..0.021 rows=50 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Seq Scan on resources r (cost=0.00..10.90 rows=90 width=149) (actual time=0.003..0.010 rows=50 loops=1)
Planning Time: 0.050 ms
Execution Time: 0.075 ms

This and all the other queries generate a sequential scan which beats the entire purpose of seek pagination

  • How do I fix this. Help is super appreciated

LINK TO DB FIDDLE

Leave a Reply

Your email address will not be published. Required fields are marked *