MSSQL/MySQL: Understanding how to Index your database
Published on Oct 24, 2011 by Jamie MunroThis is a must read for beginning developers in understanding how to index your database tables and more importantly, why you must index your database.
Before we dive into how and where you must index your database tables, it's important to understand how database engines like MySQL or MSSQL use indexes to improve the performance of your SQL queries.
When a column is indexed, the engine knows exactly where to find the row with the column index. When no index is defined, the engine must start at the very first row and read through the entire table to find any rows that match. Typically database engines can do this extremely fast, but as your database tables grow, it must read through more and more records. On top of that, these long queries will begin to compound as more and more users visit your website.
The types of indexes that will be the focus of this article are: primary, unique, and index. The primary index is typically a unique identifier of the table that is usually auto incremented when a new record is inserted. The unique index is probably less used because there are not as many columns in tables that should be unique. A good example of where to use a unique index would be on a table that stores user information. Most often a username or email address column will be stored in a user table that must be unique because it is used when a user logs into your website. Finally, there are standard indexes. These should be used on columns that you perform regular queries against – both in the where clause or the order by statement. When used in the where clause, the database engine will be able to locate the rows much quicker. When it's used in the order by statement, it will allow the engine to not perform a table sort on the results and return them in the order of the index.
Now, before you go and start indexing every column – don't, indexes must be used wisely and carefully. Each index will increase the amount of data stored about your database which means if you have too many indexes, your database size might be larger than the memory allocated on your database server preventing from quick access because the database is not fully stored in memory.
Secondly, only one index can be used at a time. What this means is, if you have a where clause that must match against multiple columns and each column is indexed separately; the engine must select one index to use.
Thirdly, each time a new record is inserted, updated, or deleted from the table, the database engine must rebuild the indexes affected. This will greatly slow down all of these actions when there are a lot of records and a lot of indexes.
Let's now explore why each table must be indexed properly. If I run the following basic query on a users table to find a user with the email address of firstname.lastname@example.org where the email column is not indexed and there are 10,000 rows.
SELECT * FROM users WHERE email = 'email@example.com'
This query found one result in approximately 0.01 seconds. Now you might be saying – that's pretty fast – and in reality it really is. However, as your table grows more and more in size, this query will begin taking more time.
If I was to make a simple change to this table and add a unique index on the email field and run the query again to compare the differences.
// MySQL ALTER TABLE `users` ADD UNIQUE ( `email` ) // MSSQL CREATE UNIQUE INDEX AK_user_email ON users(email)
Once the index is created, run the select query again:
SELECT * FROM users WHERE email = 'firstname.lastname@example.org'
This query found one result in approximately 0.0005 seconds. Now that is a truly fast query. As your table grows, this execution time should always remain relatively consistent compared to a table with no index.
Finally, how do you go about indexing your database tables? There are two approaches to take, one the simple and quick way and two, the longer and more accurate way.
The first approach you can simply "guess" what columns should be indexed on each table. This typically involves going through each table and adding an index to columns that are foreign keys to the other tables or a column that uniquely identifies a row and add an index.
This approach works relatively well and as you become more experienced you get better and better at it. However, where it lacks in performance is you might not group indexes together. For example, back to the email example above, if you were trying to log a user in you might perform a query on both the email address and password. In this scenario you would want to index the columns together as one index.
The second approach is a bit more time consuming. It requires you to retrieve every database query that is performed in your application. Once you have every query, I would begin by organizing them by table so you can see all of the queries performed on one table together.
Next you need to analyze what columns are used both in the where clause and the order by statement. When your queries contain more than one column in the where clause or order by statement, create one index including all columns.
If you have several queries that query one or a group of columns, you will want to stop and think about how you index these. Depending on the size of your database, the allotted memory on your server, you can simply add multiple indexes to each. However, if your table is really large and you don't want to bloat it further with a lot of indexes, chose the best one and only index it. My personal preference would be to select the one used most often or in a high traffic area. The other queries will still leverage this index; it simply won't be as effective as an exact matching index.
No matter how big or small your database tables are, it's extremely important to analyze the columns and add indexes on columns that are frequently used in SQL queries. One index can improve a query by immensely.