How should i properly index the mysql column when dealing with sort?

Issue

I have a log table, but I find it become very slow when I sort it.

Here’s my database table structure in short.

CREATE TABLE `webhook_logs` (
  `ID` bigint(20) UNSIGNED NOT NULL,
  `event_id` bigint(20) UNSIGNED DEFAULT NULL,
  `object_id` bigint(20) UNSIGNED DEFAULT NULL,
  `occurred_at` bigint(20) UNSIGNED DEFAULT NULL,
  `payload` text COLLATE utf8mb4_unicode_520_ci,
  `priority` bigint(1) UNSIGNED DEFAULT NULL,
  `status` varchar(32) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;

ALTER TABLE `webhook_logs`
  ADD PRIMARY KEY (`ID`),
  ADD KEY `event_id` (`event_id`),
  ADD KEY `object_id` (`object_id`),
  ADD KEY `occurred_at` (`occurred_at`),
  ADD KEY `priority` (`priority`),
  ADD KEY `status` (`status`);

There are 5M + records.

When I do

SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 ORDER BY priority ASC LIMIT 100

, it took about 5 seconds to get the records.


However, when i remove the sorting, and just do

SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 LIMIT 100

, it took only 0.0022 seconds.


I’ve been playing around with the index and see if the time improved, but with no luck. I wonder if I’m doing something wrong here.

I tried creating combo index with "occurred_at" and "priority", or combo index with all "occurred_at", "priority" and "status". None of them improved the speed, still take around 5 seconds. If any help, there server is running MYSQL 5.7.12.

Any help will be appropriated. Thanks.

Solution

You don’t need BIGINT for most of those columns. That datatype takes 8 bytes. There are much smaller datatypes. priority could be TINYINT UNSIGNED (1 byte, range of 0..255). status could be changed to a 1-byte ENUM. Such changes will shrink the data and index sizes, hence speed up most operations somewhat.

Replace INDEX(status) with

INDEX(status, occurred_at, priority, id)  -- in this order

Then your query will run somewhat faster, depending on the distribution of the data.

This might run even faster:

SELECT  w.*
    FROM  (
        SELECT id
            FROM `webhook_logs`
            WHERE  status = 'pending'
              AND  occurred_at < 1652838913000
            ORDER BY  priority ASC
            LIMIT  100
          ) AS t
    JOIN webhook_logs  USING(id)
    ORDER BY priority ASC    -- yes, this is repeated
    ;

That is because it can pick the 100 ids from the my index much faster since it is "covering", then do 100 lookups to get "*".

Answered By – Rick James

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published