I see that I have two choices.
1. Do everything on the Database server.
Get all results that match all submitted keywords and then count how many times each keyword is found for each record. Total each score for each record.
2. Do all of this nasty processor intenstive work on the web server
Get all results that match all submitted keywords from the database and place in a DataTable. For each record in the DataTable perform some C# based match and count loop. Place result of each loop in extra 'Result' column of DataTable and the sort DataTable according to score.
I can confidently code up option 2 however option 1 seems like a real headache in terms of the necessary SQL (my SQL is not that good). Also I think with option two I can use Caching to save the reordered datatable primary key with cache parameters based on the submitted keywords so the search should eventually get quite fast.
Does this all sound like nonsense ?
Has anyone tried to provide a Ranked multiple keyword search without using a Full Text Indexing ?
Thanks.Full Text Indexing would obviously be the right tool for this job.
What comes to mind for the next best solution would be to write a UDF which will take a keyword and a string and return the number of occurrences of the keyword in the string. Then you could sort by the number of occurrences.
The resulting query would look like this:
SELECT
Description,
OccurrenceCount
FROM
(SELECT Description, dbo.fnOccurrenceCount(@.SearchFor, Description) AS OccurrenceCount FROM myTable ) A
WHERE
OccurrenceCount > 0
ORDER BY
OccurrenceCount DESC,
Description
And the UDF would look like this:
CREATE FUNCTION dbo.fnOccurrenceCount (@.SearchNeedle varchar(8000), @.SearchHaystack varchar(8000))
RETURNS integer AS
BEGINDECLARE @.OccurrenceCount int
DECLARE @.FoundPosition int
DECLARE @.SearchHaystackPart varchar(1000)
DECLARE @.SearchNeedleLength intSET @.OccurrenceCount = 0
SET @.FoundPosition = 0
SET @.SearchHaystackPart = @.SearchHaystack
SET @.SearchNeedleLength = DATALENGTH(@.SearchNeedle)SET @.FoundPosition = CHARINDEX(@.SearchNeedle, @.SearchHayStackPart)
WHILE @.FoundPosition > 0
BEGIN
SET @.OccurrenceCount = @.OccurrenceCount + 1
SET @.SearchHaystackPart = SUBSTRING(@.SearchHaystackPart,@.FoundPosition+@.SearchNeedleLength,8000)
SET @.FoundPosition = CHARINDEX(@.SearchNeedle, @.SearchHayStackPart)
ENDRETURN @.OccurrenceCount
END
Terri|||[sorry if this appears as a double post my last reply must have timed out]
Thanks Terri - that works really well ! - without trying to sound too dramtic - your're a life saver !
As I said in my first post my C# is OK but my T-SQL is not great. Can you recommend any books or article links I can read which could help me to work out solutions like this for myself ?
Thanks again,
TheDr|||Cool, I'm glad you were able to make use of that method!
Personally I have found that participating in forums such as this and those on aspadvice.com to be the best way to increase my knowledge. Reading other people's problems and then trying out different ways to solve them has helped me immensely -- especially when others recommend methods I hadn't even thought of. Following links that the experts post on the forums also helps.
For reading material, I would have to recommend Ken Henderson's The Guru's Guide to Transact-SQL as a must-read.
Terri|||The only question I have is from the following line of your Proc what does the 'A' represent ? If I try any run the Proc without it I'm given an error. Is it an alias for the parenthesised result ?
<snip>
(SELECT Description, dbo.fnOccurrenceCount(@.SearchFor, Description) AS OccurrenceCount FROM myTable ) A
</snip
Thanks for your book suggestion. I'll look out for that one.
Cheers.|||Yes, that's exactly what it is; an alias for the derived table. A lazy alias at that. When using derived tables a "correlation name alias" (as they call it) is required.
Terri
No comments:
Post a Comment