SQL text search – 3x faster CONTAINS vs LIKE.

My friend asked, does SQL Server have a data type or type of index it can use to search for text in a field?

Yes, use varchar. If possible, create a non-clustered index on the column or use a full-text search catalog index. You can use the Like operator or the Contains operator if you have a Full-Text Search catalog and index built. Ref: Contains.

The business value:

How can I exclude from over 3 thousand documents which ones contain certain words?

Well, SQL Server Full-Text Search performed the test below 3x faster. Using LIKE took 99ms and using CONTAINS took 31ms. The table requires a unique index (a primary key clustered index qualifies).

Store the file paths and names on separate columns in the table you plan to import the text into, to later list the ones you need to exclude from the 3 thousand. The remaining docs would be the ones you’ll want to keep.

See full-text search script below on how to create the catalog and index. However, if the sample text or number of rows is very small then using LIKE will be very performant as well.

Hope it helps,

Hiram

use [DBAWork]
go
-- errors due to nvarchar(max), so use a fulltext index instead.
create index idx_Docs_ContentText on Docs(ContentText)
go
/*
Msg 1919, Level 16, State 1, Line 26
Column 'ContentText' in table 'Docs' is of a type that is invalid for use as a key column in an index.
*/

-- check if installed.
select fulltextserviceproperty('IsFullTextInstalled')
go
create fulltext catalog [ft] with accent_sensitivity = on
as default
go

-- this is sample table, has 4.5k rows and ContentText max length is 180.
create fulltext index on [Docs] key index [PK_Docs] on ([ft]) with (change_tracking = auto)
go
alter fulltext index on [Docs] add ([ContentText])
go
alter fulltext index on [dbo].[Docs] enable
go

-- this is what the population schedule does.
alter fulltext catalog [ft] reorganize --rebuild

-- test
use dbawork
go
set statistics time on;
select * from Docs where ContentText like '%stat%'
go
-- note: 'stat' returns 0, be aware it doesn't work exactly like LIKE.
select * from Docs where contains(ContentText, 'stat')

/*
(4516 rows affected)
 SQL Server Execution Times:
   CPU time = 32 ms,  elapsed time = 99 ms.

(4516 rows affected)
 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 31 ms.
*/
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s