Google Groups Home
Help | Sign in
One example of a slow query.
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  9 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
David Cressey  
View profile
 More options Jun 24, 2:18 pm
Newsgroups: comp.databases
From: "David Cressey" <cresse...@verizon.net>
Date: Tue, 24 Jun 2008 18:18:08 GMT
Local: Tues, Jun 24 2008 2:18 pm
Subject: One example of a slow query.
At one site,  I managed to verify that one of these two queries ran 10 to
100 times faster than the other:

select * from customers
   where country = 'US' and
              state = 'TX' and
              City = 'Dallas'

select * from customers
   where  state = 'TX' and
              City = 'Dallas'

I expected the first one to run much faster  and it did.  Why?  Because
there was an index with a compound index key,  namely country, state, and
city.  In the second case the poor optimizer was faced with walking the
index instead of a quick lookup,  as it could do in the first case.

But the clever programmers had used the second form,  because they knew that
all the customers were in US,  and they wanted to keep things simple for the
DBMS.

Not only were they running way too slow,  but also,  they had written code
that could break when the user community decided to extend the customer base
to other countries.

This is just one example out of hundreds of possible examples,  where the
programmers code strangely in order to speed things up,  and instead they
slow things down.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Segall  
View profile
 More options Jun 25, 5:13 am
Newsgroups: comp.databases
From: David Segall <da...@address.invalid>
Date: Wed, 25 Jun 2008 09:13:38 GMT
Local: Wed, Jun 25 2008 5:13 am
Subject: Re: One example of a slow query.

"David Cressey" <cresse...@verizon.net> wrote:
> Because
>there was an index with a compound index key,  namely country, state, and
>city.

Why would you do that instead of indexing the fields separately?

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roy Hann  
View profile
 More options Jun 25, 6:12 am
Newsgroups: comp.databases
From: "Roy Hann" <specia...@processed.almost.meat>
Date: Wed, 25 Jun 2008 11:12:55 +0100
Local: Wed, Jun 25 2008 6:12 am
Subject: Re: One example of a slow query.
"David Segall" <da...@address.invalid> wrote in message

news:nu2464djgj6vku4tci9if3ics221qidcp5@4ax.com...

> "David Cressey" <cresse...@verizon.net> wrote:

>> Because
>>there was an index with a compound index key,  namely country, state, and
>>city.
> Why would you do that instead of indexing the fields separately?

Most DBMS engines will use only one index for a particular restriction step,
so in general you will want to define compound keys where they make sense.

Depending on the implementation of the index it might be possible to exploit
right-incomplete key values, so you would try to define your key with the
most frequently known parts of the key are on the left.  Then although the
key may include more columns than you can supply a value for, you might
still be able to exploit it if you supply the parts you do know.

Roy


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Arved Sandstrom  
View profile
 More options Jun 25, 7:55 am
Newsgroups: comp.databases
From: "Arved Sandstrom" <asandst...@accesswave.ca>
Date: Wed, 25 Jun 2008 11:55:34 GMT
Local: Wed, Jun 25 2008 7:55 am
Subject: Re: One example of a slow query.
"Roy Hann" <specia...@processed.almost.meat> wrote in message

news:O5-dnXaHnYG1h__VnZ2dnUVZ8hednZ2d@pipex.net...

How common is the latter implementation? MySQL evidently does this, so that
if you had an index on (col1,col2,col3), queries on (col1), (col1,col2) and
(col1,col2,col3) all use the index, but a query on (col2,col3), for example,
does not. What is this incomplete use of multicolumn indexes called,
exactly? I ran across articles that referred to this as partial indexing,
but my understanding of a partial (filtered) index is that the latter
indexes on a subset of rows, which is something different. I'm guessing that
for any given DBMS you'd just have to look up the docs on multicolumn
indexing and see what they say about using leading columns.

As I understand it, it wouldn't be totally useless to have only the separate
indexes for the three columns, namely (col1), (col2), and (col3), as the
DBMS might use index combination (as does PostgreSQL 8.1 and up). Or perhaps
even better, have the multicolumn index and one or more single-column
indexes, depending on what you anticipate the most common queries to be.

AHS


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed Prochak  
View profile
 More options Jun 25, 8:09 am
Newsgroups: comp.databases
From: Ed Prochak <edproc...@gmail.com>
Date: Wed, 25 Jun 2008 05:09:24 -0700 (PDT)
Local: Wed, Jun 25 2008 8:09 am
Subject: Re: One example of a slow query.
On Jun 25, 4:13 am, David Segall <da...@address.invalid> wrote:

> "David Cressey" <cresse...@verizon.net> wrote:
> > Because
> >there was an index with a compound index key,  namely country, state, and
> >city.

> Why would you do that instead of indexing the fields separately?

Better is one index defines as city, state, country.
Put the most discriminating element first. then partial searches on
that index still work.
  Ed

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Kellerer  
View profile
 More options Jun 25, 8:16 am
Newsgroups: comp.databases
From: Thomas Kellerer <YQDHXVLMU...@spammotel.com>
Date: Wed, 25 Jun 2008 14:16:23 +0200
Local: Wed, Jun 25 2008 8:16 am
Subject: Re: One example of a slow query.
Arved Sandstrom, 25.06.2008 13:55:

> How common is the latter implementation? MySQL evidently does this, so that
> if you had an index on (col1,col2,col3), queries on (col1), (col1,col2) and
> (col1,col2,col3) all use the index, but a query on (col2,col3), for example,
> does not. What is this incomplete use of multicolumn indexes called,
> exactly?

I think that this is related to the fact that most of the time the index is implemented as a B-Tree index. And technically it is not possible to access the "second level" directly without finding the "first level".

> As I understand it, it wouldn't be totally useless to have only the separate
> indexes for the three columns, namely (col1), (col2), and (col3), as the
> DBMS might use index combination (as does PostgreSQL 8.1 and up). Or perhaps
> even better, have the multicolumn index and one or more single-column
> indexes, depending on what you anticipate the most common queries to be.

Yes and no. Keep in mind that each index will add an overhead when updating, deleting or inserting rows. So if you have a lot of indexes, your update performance may degrade. The "art" is finding the proper balance between the two requirements (fast reads and fast writes)

Postgres does a indeed good job in "re-using" indexes. For e.g. Oracle something similar could be achieved using bitmap indexes (but they have higher update costs and are not always feasible)

Thomas


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed Prochak  
View profile
 More options Jun 25, 8:16 am
Newsgroups: comp.databases
From: Ed Prochak <edproc...@gmail.com>
Date: Wed, 25 Jun 2008 05:16:06 -0700 (PDT)
Local: Wed, Jun 25 2008 8:16 am
Subject: Re: One example of a slow query.
On Jun 25, 6:55 am, "Arved Sandstrom" <asandst...@accesswave.ca>
wrote:

Oracle calls it a range scan in the EXPLAIN PLAN report. And that
makes sense if you think about it. (at least it does to me.)

>  ... I ran across articles that referred to this as partial indexing,
> but my understanding of a partial (filtered) index is that the latter
> indexes on a subset of rows, which is something different. I'm guessing that
> for any given DBMS you'd just have to look up the docs on multicolumn
> indexing and see what they say about using leading columns.

> As I understand it, it wouldn't be totally useless to have only the separate
> indexes for the three columns, namely (col1), (col2), and (col3), as the
> DBMS might use index combination (as does PostgreSQL 8.1 and up). Or perhaps
> even better, have the multicolumn index and one or more single-column
> indexes, depending on what you anticipate the most common queries to be.

> AHS

If you must have one index, put the lower unit element first. So on a
phone number table
you might put Number, Exchange, Area code (e.g. 800-123-4567 indexed
as 4567,123,800). then partial searches on Exchange and number can use
the index.

  Ed


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Cressey  
View profile
 More options Jun 25, 10:10 am
Newsgroups: comp.databases
From: "David Cressey" <cresse...@verizon.net>
Date: Wed, 25 Jun 2008 14:10:37 GMT
Local: Wed, Jun 25 2008 10:10 am
Subject: Re: One example of a slow query.

"Roy Hann" <specia...@processed.almost.meat> wrote in message

news:O5-dnXaHnYG1h__VnZ2dnUVZ8hednZ2d@pipex.net...

> "David Segall" <da...@address.invalid> wrote in message
> news:nu2464djgj6vku4tci9if3ics221qidcp5@4ax.com...
> > "David Cressey" <cresse...@verizon.net> wrote:

> >> Because
> >>there was an index with a compound index key,  namely country, state,
and
> >>city.
> > Why would you do that instead of indexing the fields separately?

> Most DBMS engines will use only one index for a particular restriction
step,
> so in general you will want to define compound keys where they make sense.

In general, true.  However, the optimizer in the DBMS in this case would
have been capable of exploiting separate simple indexes.  The question
remians,  why would the index designer have chosen a single index with a
compound key over multiple single indexes.

Several possible reasons:

Query speed.  For the query that includes country, state,  and city,  the
single index can locate the right rows with fewer logical disk reads,  and
therefore fewer physical disk reads.

Update speed.  It takes fewer disk writes to commit a new row with one
compound index than with multiple simple indexes.

Psychological factors.  Just as designers tend to think of a schema with
fewer tables as "simpler" than than one with more tables,  so likewise they
might think of a table with fewer indexes as "simpler".

Why didn't I change the index design?  I was a visiting instructor for one
week.  I didn't know what the other consequences of changing the index
design would be.  The original designer may have had considerations that
eluded me.  Index design involves anticipating traffic.  There's always a
little guesswork in it.

Besides,  I was teaching programmers tuning,  and this gave me an excellent
opportunity to teach a key point:  that a query that provides more selection
criteria can run a lot faster than one with fewer criteria.  This is counter
intuitive to programmers.  And a real live case from their production
database was worth more than a score of prepared cases from my course
materials.

> Depending on the implementation of the index it might be possible to
exploit
> right-incomplete key values, so you would try to define your key with the
> most frequently known parts of the key are on the left.  Then although the
> key may include more columns than you can supply a value for, you might
> still be able to exploit it if you supply the parts you do know.

This is absolutely correct.  But, for reasons that I can't really fathom,
people tend to design
compound indexes in top down order:  country, state, city.  I imagine that
city, state, country would almost always be more useful.  But both
programmers and designers have been trained not to ever think bottom up,  so
they reject a bottom up index without careful analysis.  This is just my
take on it.

However, the top down index might be more useful in a large reporting query,
where the report is being grouped by country, state, city.  There are a lot
of things to consider when doing index design.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Cressey  
View profile
 More options Jun 25, 10:13 am
Newsgroups: comp.databases
From: "David Cressey" <cresse...@verizon.net>
Date: Wed, 25 Jun 2008 14:13:42 GMT
Local: Wed, Jun 25 2008 10:13 am
Subject: Re: One example of a slow query.

"Arved Sandstrom" <asandst...@accesswave.ca> wrote in message

news:Wmq8k.167$1o6.154@edtnps83...

The term I've seen in the literature is "range retrieval".  Range retrieval
also describes
criteria like "where salary between 50000 and 100000".

> As I understand it, it wouldn't be totally useless to have only the
separate
> indexes for the three columns, namely (col1), (col2), and (col3), as the
> DBMS might use index combination (as does PostgreSQL 8.1 and up). Or
perhaps
> even better, have the multicolumn index and one or more single-column
> indexes, depending on what you anticipate the most common queries to be.

Yes but... the more indexes the slower inserts will run.  There are multiple
tradeoffs here.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google