Comments | halffast: SQL / CS question

halffast

SQL / CS question

Aug 08, 2008 14:36

GEEK QUESTION ( Read more... )

Comments 19

jpmodisette August 8 2008, 22:42:43 UTC

I'm not entirely sure I understand your problem. Let me paraphrase and tell me if I am right: you have taken all the entries in the searchable fields and stuck them in one bigger table with a single field, then you index and search that, so that if the search is "first name = Dana OR last name = Watanabe" it turns into a search for "value = Dana" and a search for "value = Watanabe", which is fast, but then there is no way to search for "first name = Dana AND last name = Watanabe"?

halffast August 9 2008, 00:59:51 UTC

there is one search field and lots of columns to search (name, dept, email, etc), but simplified to two columns

someone searches "dana watanabe"

and i want the results of "first name = Dana AND last name = Watanabe" OR "first name = Watanabe AND last name = Dana"

and i want this in SQL not some homebrewed database like i know you would do. i've already suggested that. :)

jpmodisette August 9 2008, 01:31:12 UTC

I think something like this will do it (note I am not a SQL guy, but I can use google):

SELECT entry_id
FROM SecondTable
WHERE search_term IN ( 'Dana', 'Watanabe' )
GROUP BY entry_id
HAVING COUNT(*) >= 2

That says 1. make a list of all records where search_term is either Dana or Watanabe, 2. sort those records by entry_id, and 3. only list the records where the number of records with than entry ID is >= 2 (I said ">=" instead of "=" so we wouldn't throw out your entry if you also lived on Dana Street.

jpmodisette August 9 2008, 01:31:34 UTC

)

Thread 13

flaterik August 9 2008, 00:08:59 UTC

I now share an office with our chief data architect, and am sitting in a room full of amazing SQL people.

I would ask them, but they seem busy.

peterfuhry August 9 2008, 16:32:20 UTC

hmmm.... how about this:

select *
from directory_table
where
firstname||lastname||phone||address like '%Dana%'
and firstname||lastname||phone||address like '%Watanabe%'

in other words, rather than search each field individually for the search term, you just search a big concatenated string for the search term once. I think that would perform better. You could even create an index on the concatenated fields together, I think.

jpmodisette August 9 2008, 18:59:29 UTC

That seems a lot easier than my solution.

I could imagine that searches for '%Dana%' might perform considerably worse than searches for 'Dana%' - do you know if that's the case? If I were writing the database implementation it would probably be that way, since you can immediately discard most strings as not being 'Dana%' if they don't have a leading D, but to discard it as not being '%Dana%' I have to search all but the last 4 characters of the string. With an index it might be even worse - although it depends how the index was made. You could make an index that was designed to support searches for '%Something%' by including all cylic permutations of the string.

halffast August 10 2008, 04:33:52 UTC

as far as i can tell the things that kill the use of indexes are:
starting with a wildcard
using more than one indexes with ORs (ANDing indexes is fine, but the databases we're using seems unhappy with the OR)

jpmodisette August 10 2008, 15:43:28 UTC

I would expect starting with a wildcard to be slow for the reasons I stated above. Really the database should have an option for indices to include cyclic permutations so you can then do searches on it with leading wildcards.

I'm not sure why ORing indices would be slow - it seems like that could be done quickly without having to do anything special to the index. Well, databases are mysterious.

What system are you using that doesn't support nested queries?

Thread 5