WHERE before GROUP BY and HAVING after GROUP BY in MYSQL

Issue

Which out of WHERE before GROUP BY and HAVING after GROUP BY is a better choice according to time complexity?

Solution

First, lets clarify what each does.

WHERE – this is what you are filtering data based on. The columns here are a bit part of your query performance time. If you are looking for a specific type of thing, or date ranges, a customer or whatever. You would want an index on the primary table you are querying against to help match these columns. That will be where your performance is going to get the most help.

GROUP BY – this is sometimes used in conjunction with a where. Such as querying all customers who have place orders and grouping based on a city, state, region, time period. So the WHERE clause would be the primary for getting the data, but if your index ALSO has columns associated with the group by, that TOO can improve performance.

Now, the primary purpose of the GROUP BY is to perform some sort of aggregation of data based on a common thing, such as previously stated. You may want a query of all transactions purchased by customers within a given time period. So, you may have a WHERE clause that indicates the from/to date period. But, now the grouping, you want per customer, so you may want a

   CustomerID,
   SUM(CustomerPurchaseAmount) totalPurchases, 
   count(*) as TimesOrdered

so the group by CustomerID would do the aggregates per the sample above and have a result of one record per customer in this example.

The HAVING clause is AFTER the where of what records to query, the GROUP BY which performs the aggregations and can act on some aggregate. Such as I only want customers purchasing from X to Y dates and their total purchases is greater than some amount. So the having clause might be something like

HAVING SUM(CustomerPurchaseAmount) > 500

So, from the example, date applied via where, the group gets the totals per specific customer, and now the HAVING is applied to see what customers had ex: more than 500 in purchases.

Performance is significantly handled by having good indexes to match your more common queries on the WHERE clause, but can also be optimized to ALSO support needs of GROUP BY as a secondary consideration. Having just comes along after all else is done.

Answered By – DRapp

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published