Improve MySQL performance with indices and Explain

Akademily
6 min readAug 10, 2020
Improve MySQL performance with indices and Explain

Enabling profiling is an affordable way to get an accurate estimate of the time the request has been executed. First you need to enable profiling and call show profiles to get an accurate estimate of the time the request has been executed.

REQUEST PROFILING

For example, we have the following operation to add data. Suppose that User1 and Gallery1 are already created:

INSERT INTO `homestead`.`images` (`id`, `gallery_id`, `original_filename`, `description`) VALUES
(1, 1, 'me.jpg', 'me.jpg', 'A photo of me walking down the street'),
(2, 1, 'dog.jpg', 'dog.jpg', 'A photo of my dog on the street'),
(3, 1, 'cat.jpg', 'cat.jpg', 'A photo of my cat walking down the street'),
(4, 1, 'purr.jpg', 'purr.jpg', 'A photo of my cat purring');

Executing this request will not cause any problems. But let’s consider the following command:

SELECT * FROM `homestead`.`images` AS i
WHERE i.description LIKE '%street%';

This query is a good example of what might cause problems in the future if we sample a large number of images from the database.

To get the exact time of execution of this query, you can use the following SQL code:

set profiling = 1;
SELECT * FROM `homestead`.`images` AS i
WHERE i.description LIKE '%street%';
show profiles;

Result:

Query_IdDurationRequest10.00016950SHOW WARNINGS20.00039200SELECT * FROM homestead.images AS i nWHERE i.description LIKE ’%street%’nLIMIT 0, 100030.00037600SHOW KEYS FROM homestead.images40.00034625SHOW DATABASES LIKE ’homestead50.00027600SHOW TABLES FROM homestead LIKE ’images’60.00024950SELECT * FROM homestead.images WHERE 0=170.00104300SHOW FULL COLUMNS FROM homestead.images LIKE ’id’

The command show profiles displays the execution time not only of the original request, but of all the others. In this way, you can accurately profile the requests.

OPTIMIZATION

But how do you optimize them? To do this, you can use MySQL command explain and improve query performance based on actual information.

Explain is used to get a query execution plan. The way MySQL will execute a query. This command works with the SELECT, DELETE, INSERT, REPLACE and UPDATE operators. The official documentation describes the explain command as follows:

With EXPLAIN you can see where to add indexes to the table to make the operator run faster. You can also use EXPLAIN to check if the table optimizer combines in an optimal order.

As an example, we will look at the query that UserManager.php performs to find the user at the email address:

SELECT * FROM `homestead`.`users` WHERE email = 'claudio.ribeiro@examplemail.com';
To use the explain command, add it before requesting a selection:

EXPLAIN SELECT * FROM `homestead`.`users` WHERE email = 'claudio.ribeiro@examplemail.com';

The result of the work:

idselect_typetablepartitionstypepossible_keysKeykey_lenrefrowsfilteredExtra1SIMPLE‘users’NULL‘const’‘UNIQ_1483A5E9E7927C74’‘UNIQ_1483A5E9E7927C74’‘182’‘const’100.00NULL

  • id: is a serial identifier for each SELECT request.
  • select_type: the type of SELECT request. This field may accept different values:
  1. SIMPLE: simple query without subqueries or unions
  2. PRIMARY: select is in an external request;
  3. DERIVED: select is part of the subquery;
  4. SUBQUERY: first select is part of the subquery;
  5. UNION: select is the second or subsequent operator of the association.
  • table: the name of the database table.
  • type: specifies how MySQL merges the tables used. The value may indicate the missing indexes and how the query should be rewritten. Possible values for this field:
  1. system: the table has zero or one row.
  2. const: the table has only one corresponding row that is indexed. This is the fastest type of association.
  3. eq_ref: all parts of the index are used by the union. The PRIMARY_KEY or UNIQUE NOT NULL index is used.
  4. ref: all rows with matching index for each combination of rows from the previous one will be read from the table. This type of union is displayed for indexed columns compared using operators= or<=>.
  5. Fulltext: the union uses FULLTEXT table index.
  6. ref_or_null: this is the same as ref, but also contains rows with the value NULL.
  7. index_merge: the union uses an index list to get the resulting set. The KEY column will contain the keys used.
  8. unique_subquery: the IN subquery returns only one result from the table and uses the primary key.
  9. range: the index is used to find suitable rows within a certain range.
  10. index: the entire index tree is scanned to find the appropriate rows.
  11. all: the table is scanned to find the appropriate rows to combine. This is the least optimal type of merge. It often indicates that there are no corresponding indexes in the table.
  • possible_keys: shows the keys that can be used by MySQL to find the rows in the table.
  • keys: the actual index used by MySQL. The DBMS always looks for the optimal key that can be used for a query. When combining many tables, it can identify other keys that are not listed in the possible_keys list, but are more optimal.
  • key_len: specifies the length of the index that the query optimizer has selected for use.
  • ref: shows the columns or constants that are compared to the index specified in the key column.
  • key_len: shows the number of records that have been checked to produce the output. This is an important indicator; the fewer records checked, the better.
  • Extra: contains additional information. Values such as Using filesort or Using temporary in this column may indicate a problem request.

Full documentation on the output format of explain can be found on the official MySQL page.

Returning to our query. It has the SIMPLE sample type with the const. This is the most optimal combination. But what happens when more complex queries are executed?

For example, when you want to get all images of the gallery. Or display only pictures that contain the word “cat” in the description. Consider the following query:

SELECT gal.name, gal.description, img.filename, img.description FROM `homestead`.`users` AS users
LEFT JOIN `homestead`.`galleries` AS gal ON users.id = gal.user_id
LEFT JOIN `homestead`.`images` AS img on img.gallery_id = gal.id
WHERE img.description LIKE '%dog%';

In this case, we will have more information for analysis:

EXPLAIN SELECT gal.name, gal.description, img.filename, img.description FROM `homestead`.`users` AS users
LEFT JOIN `homestead`.`galleries` AS gal ON users.id = gal.user_id
LEFT JOIN `homestead`.`images` AS img on img.gallery_id = gal.id
WHERE img.description LIKE '%dog%';

The result of the query:

idselect_typetablepartitionstypepossible_keysKeykey_lenrefrowsfilteredExtra1SIMPLE‘users’NULL‘index’‘PRIMARY,UNIQ_1483A5E9BF396750’‘UNIQ_1483A5E9BF396750’‘108’NULL100.00‘Using index’1SIMPLE‘gal’NULL‘ref’‘PRIMARY,UNIQ_F70E6EB7BF396750,IDX_F70E6EB7A76ED395’‘UNIQ_1483A5E9BF396750’‘108’‘homestead.users.id’100.00NULL1SIMPLE‘img’NULL‘ref’‘IDX_E01FBE6A4E7AF8F’‘IDX_E01FBE6A4E7AF8F’‘109’‘homestead.gal.id’‘25.00’‘Using where’

The main columns that we should pay attention to are type and the goal is to get the best value in the type column and the smallest possible number in the rows column.

The result of the first query is a bad index. This means that we can optimize the request.

The Users table is not used. Therefore, we can extend the query to make sure that we are covering the users, or we can remove some of the users query. But this will only increase the complexity and execution time.

SELECT gal.name, gal.description, img.filename, img.description FROM `homestead`.`galleries` AS gal
LEFT JOIN `homestead`.`images` AS img on img.gallery_id = gal.id
WHERE img.description LIKE '%dog%';

Let’s look at the output:

idselect_typeTablepartitionstypepossible_keyskeykey_lenRefrowsfilteredExtra1SIMPLE‘gal’NULL‘ALL’‘PRIMARY,UNIQ_1483A5E9BF396750’NULLNULLNULL100.00NULL1SIMPLE‘img’NULL‘ref’‘IDX_E01FBE6A4E7AF8F’‘IDX_E01FBE6A4E7AF8F’‘109’‘homestead.gal.id’‘25.00’‘Using where’

We still have the value of type ALL. This is one of the worst combination options, but sometimes it is the only possible type.

We need all images of the gallery, so we should look at the entire gallery table. The indexes are suitable for finding specific data in the table. But not to sample all the information in the table.

The last thing we can do is to add index FULLTEXT to the description field. So we will change LIKE to match() and increase productivity. More details about full-text indexes can be found here.

Let us return to the functionality of the application we are developing: newest and related. They are used in galleries. The following requests are used in them:

EXPLAIN SELECT * FROM `homestead`.`galleries` AS gal.
LEFT JOIN `homestead`.`users` AS u ON u.id = gal.user_id
WHERE u.id = 1
ORDER BY gal.created_at DESC
LIMIT 5;

The above code is intended for use by the user.

EXPLAIN SELECT * FROM `homestead`.`galleries` AS gal
ORDER BY gal.created_at DESC
LIMIT 5;

The above code is for newest.

At first glance, these requests are fast because they use . Unfortunately, in our application these queries also use the ORDER BY operator. We therefore lose the benefits of using LIMIT.

Working with LIMIT can degrade performance. To verify this, let us run the explain command.

idselect_typeTablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra1SIMPLE‘gal’NULL‘ALL’‘IDX_F70E6EB7A76ED395’NULLNULLNULL100.00‘Using where; Using filesort’1SIMPLE‘u’NULL‘eq_ref’‘PRIMARY,UNIQ_1483A5E9BF396750’‘PRIMARY‘108’‘homestead.gal.id’‘100.00’NULL

and

idselect_typetablepartitionsTypepossible_keyskeykey_lenrefrowsfilteredExtra1SIMPLE‘gal’NULL‘ALL’NULLNULLNULLNULL100.00‘Using filesort’

As we can see, we have the worst type of union: ALL for both requests.

The combination with LIMIT has often caused performance problems with MySQL. This operator mapping is used in most interactive applications with large data sets.

RECOMMENDATIONS FOR SOLVING THIS PROBLEM

Use the indexes. In our case created_at is an excellent option. Thus, we execute both LIMIT without scanning and sorting the full set of results.

Sort by column in the leading table. If ORDER BY is specified after a field from the table that is not the first in the order of combining, the index cannot be used.

Do not sort by expression. Expressions and functions do not allow using ORDER BY indexes.

Beware of a large value . Large LIMIT values cause ORDER BY to be sorted by more rows. This affects performance.

CONCLUSION

The explain command allows you to identify problems in queries at an early stage of application development and provide the program with high performance.

--

--

Akademily

We conduct reviews, guides and comparative tests of gaming laptops, monitors, graphics cards, keyboards, mouses, headsets and chairs to help you buy the best ga