At one site, I managed to verify that one of these two queries ran 10 to 100 times faster than the other:
select * from customers where country = 'US' and state = 'TX' and City = 'Dallas'
select * from customers where state = 'TX' and City = 'Dallas'
I expected the first one to run much faster and it did. Why? Because there was an index with a compound index key, namely country, state, and city. In the second case the poor optimizer was faced with walking the index instead of a quick lookup, as it could do in the first case.
But the clever programmers had used the second form, because they knew that all the customers were in US, and they wanted to keep things simple for the DBMS.
Not only were they running way too slow, but also, they had written code that could break when the user community decided to extend the customer base to other countries.
This is just one example out of hundreds of possible examples, where the programmers code strangely in order to speed things up, and instead they slow things down.
>> Because >>there was an index with a compound index key, namely country, state, and >>city. > Why would you do that instead of indexing the fields separately?
Most DBMS engines will use only one index for a particular restriction step, so in general you will want to define compound keys where they make sense.
Depending on the implementation of the index it might be possible to exploit right-incomplete key values, so you would try to define your key with the most frequently known parts of the key are on the left. Then although the key may include more columns than you can supply a value for, you might still be able to exploit it if you supply the parts you do know.
>>> Because >>>there was an index with a compound index key, namely country, state, and >>>city. >> Why would you do that instead of indexing the fields separately?
> Most DBMS engines will use only one index for a particular restriction > step, so in general you will want to define compound keys where they make > sense.
> Depending on the implementation of the index it might be possible to > exploit right-incomplete key values, so you would try to define your key > with the most frequently known parts of the key are on the left. Then > although the key may include more columns than you can supply a value for, > you might still be able to exploit it if you supply the parts you do know.
> Roy
How common is the latter implementation? MySQL evidently does this, so that if you had an index on (col1,col2,col3), queries on (col1), (col1,col2) and (col1,col2,col3) all use the index, but a query on (col2,col3), for example, does not. What is this incomplete use of multicolumn indexes called, exactly? I ran across articles that referred to this as partial indexing, but my understanding of a partial (filtered) index is that the latter indexes on a subset of rows, which is something different. I'm guessing that for any given DBMS you'd just have to look up the docs on multicolumn indexing and see what they say about using leading columns.
As I understand it, it wouldn't be totally useless to have only the separate indexes for the three columns, namely (col1), (col2), and (col3), as the DBMS might use index combination (as does PostgreSQL 8.1 and up). Or perhaps even better, have the multicolumn index and one or more single-column indexes, depending on what you anticipate the most common queries to be.
> How common is the latter implementation? MySQL evidently does this, so that > if you had an index on (col1,col2,col3), queries on (col1), (col1,col2) and > (col1,col2,col3) all use the index, but a query on (col2,col3), for example, > does not. What is this incomplete use of multicolumn indexes called, > exactly?
I think that this is related to the fact that most of the time the index is implemented as a B-Tree index. And technically it is not possible to access the "second level" directly without finding the "first level".
> As I understand it, it wouldn't be totally useless to have only the separate > indexes for the three columns, namely (col1), (col2), and (col3), as the > DBMS might use index combination (as does PostgreSQL 8.1 and up). Or perhaps > even better, have the multicolumn index and one or more single-column > indexes, depending on what you anticipate the most common queries to be.
Yes and no. Keep in mind that each index will add an overhead when updating, deleting or inserting rows. So if you have a lot of indexes, your update performance may degrade. The "art" is finding the proper balance between the two requirements (fast reads and fast writes)
Postgres does a indeed good job in "re-using" indexes. For e.g. Oracle something similar could be achieved using bitmap indexes (but they have higher update costs and are not always feasible)
> >>> Because > >>>there was an index with a compound index key, namely country, state, and > >>>city. > >> Why would you do that instead of indexing the fields separately?
> > Most DBMS engines will use only one index for a particular restriction > > step, so in general you will want to define compound keys where they make > > sense.
> > Depending on the implementation of the index it might be possible to > > exploit right-incomplete key values, so you would try to define your key > > with the most frequently known parts of the key are on the left. Then > > although the key may include more columns than you can supply a value for, > > you might still be able to exploit it if you supply the parts you do know.
> > Roy
> How common is the latter implementation? MySQL evidently does this, so that > if you had an index on (col1,col2,col3), queries on (col1), (col1,col2) and > (col1,col2,col3) all use the index, but a query on (col2,col3), for example, > does not. What is this incomplete use of multicolumn indexes called, > exactly?
Oracle calls it a range scan in the EXPLAIN PLAN report. And that makes sense if you think about it. (at least it does to me.)
> ... I ran across articles that referred to this as partial indexing, > but my understanding of a partial (filtered) index is that the latter > indexes on a subset of rows, which is something different. I'm guessing that > for any given DBMS you'd just have to look up the docs on multicolumn > indexing and see what they say about using leading columns.
> As I understand it, it wouldn't be totally useless to have only the separate > indexes for the three columns, namely (col1), (col2), and (col3), as the > DBMS might use index combination (as does PostgreSQL 8.1 and up). Or perhaps > even better, have the multicolumn index and one or more single-column > indexes, depending on what you anticipate the most common queries to be.
> AHS
If you must have one index, put the lower unit element first. So on a phone number table you might put Number, Exchange, Area code (e.g. 800-123-4567 indexed as 4567,123,800). then partial searches on Exchange and number can use the index.
> >> Because > >>there was an index with a compound index key, namely country, state, and > >>city. > > Why would you do that instead of indexing the fields separately?
> Most DBMS engines will use only one index for a particular restriction step, > so in general you will want to define compound keys where they make sense.
In general, true. However, the optimizer in the DBMS in this case would have been capable of exploiting separate simple indexes. The question remians, why would the index designer have chosen a single index with a compound key over multiple single indexes.
Several possible reasons:
Query speed. For the query that includes country, state, and city, the single index can locate the right rows with fewer logical disk reads, and therefore fewer physical disk reads.
Update speed. It takes fewer disk writes to commit a new row with one compound index than with multiple simple indexes.
Psychological factors. Just as designers tend to think of a schema with fewer tables as "simpler" than than one with more tables, so likewise they might think of a table with fewer indexes as "simpler".
Why didn't I change the index design? I was a visiting instructor for one week. I didn't know what the other consequences of changing the index design would be. The original designer may have had considerations that eluded me. Index design involves anticipating traffic. There's always a little guesswork in it.
Besides, I was teaching programmers tuning, and this gave me an excellent opportunity to teach a key point: that a query that provides more selection criteria can run a lot faster than one with fewer criteria. This is counter intuitive to programmers. And a real live case from their production database was worth more than a score of prepared cases from my course materials.
> Depending on the implementation of the index it might be possible to exploit > right-incomplete key values, so you would try to define your key with the > most frequently known parts of the key are on the left. Then although the > key may include more columns than you can supply a value for, you might > still be able to exploit it if you supply the parts you do know.
This is absolutely correct. But, for reasons that I can't really fathom, people tend to design compound indexes in top down order: country, state, city. I imagine that city, state, country would almost always be more useful. But both programmers and designers have been trained not to ever think bottom up, so they reject a bottom up index without careful analysis. This is just my take on it.
However, the top down index might be more useful in a large reporting query, where the report is being grouped by country, state, city. There are a lot of things to consider when doing index design.
> >>> Because > >>>there was an index with a compound index key, namely country, state, and > >>>city. > >> Why would you do that instead of indexing the fields separately?
> > Most DBMS engines will use only one index for a particular restriction > > step, so in general you will want to define compound keys where they make > > sense.
> > Depending on the implementation of the index it might be possible to > > exploit right-incomplete key values, so you would try to define your key > > with the most frequently known parts of the key are on the left. Then > > although the key may include more columns than you can supply a value for, > > you might still be able to exploit it if you supply the parts you do know.
> > Roy
> How common is the latter implementation? MySQL evidently does this, so that > if you had an index on (col1,col2,col3), queries on (col1), (col1,col2) and > (col1,col2,col3) all use the index, but a query on (col2,col3), for example, > does not. What is this incomplete use of multicolumn indexes called, > exactly? I ran across articles that referred to this as partial indexing, > but my understanding of a partial (filtered) index is that the latter > indexes on a subset of rows, which is something different. I'm guessing that > for any given DBMS you'd just have to look up the docs on multicolumn > indexing and see what they say about using leading columns.
The term I've seen in the literature is "range retrieval". Range retrieval also describes criteria like "where salary between 50000 and 100000".
> As I understand it, it wouldn't be totally useless to have only the separate > indexes for the three columns, namely (col1), (col2), and (col3), as the > DBMS might use index combination (as does PostgreSQL 8.1 and up). Or perhaps > even better, have the multicolumn index and one or more single-column > indexes, depending on what you anticipate the most common queries to be.
Yes but... the more indexes the slower inserts will run. There are multiple tradeoffs here.