My friend asked, does SQL Server have a data type or type of index it can use to search for text in a field?
Yes, use varchar. If possible, create a non-clustered index on the column or use a full-text search catalog index. You can use the Like operator or the Contains operator if you have a Full-Text Search catalog and index built. Ref: Contains.
The business value:
How can I exclude from over 3 thousand documents which ones contain certain words?
Well, SQL Server Full-Text Search performed the test below 3x faster. Using LIKE took 99ms and using CONTAINS took 31ms. The table requires a unique index (a primary key clustered index qualifies).
Store the file paths and names on separate columns in the table you plan to import the text into, to later list the ones you need to exclude from the 3 thousand. The remaining docs would be the ones you’ll want to keep.
See full-text search script below on how to create the catalog and index. However, if the sample text or number of rows is very small then using LIKE will be very performant as well.
Hope it helps,
use [DBAWork] go -- errors due to nvarchar(max), so use a fulltext index instead. create index idx_Docs_ContentText on Docs(ContentText) go /* Msg 1919, Level 16, State 1, Line 26 Column 'ContentText' in table 'Docs' is of a type that is invalid for use as a key column in an index. */ -- check if installed. select fulltextserviceproperty('IsFullTextInstalled') go create fulltext catalog [ft] with accent_sensitivity = on as default go -- this is sample table, has 4.5k rows and ContentText max length is 180. create fulltext index on [Docs] key index [PK_Docs] on ([ft]) with (change_tracking = auto) go alter fulltext index on [Docs] add ([ContentText]) go alter fulltext index on [dbo].[Docs] enable go -- this is what the population schedule does. alter fulltext catalog [ft] reorganize --rebuild -- test use dbawork go set statistics time on; select * from Docs where ContentText like '%stat%' go -- note: 'stat' returns 0, be aware it doesn't work exactly like LIKE. select * from Docs where contains(ContentText, 'stat') /* (4516 rows affected) SQL Server Execution Times: CPU time = 32 ms, elapsed time = 99 ms. (4516 rows affected) SQL Server Execution Times: CPU time = 15 ms, elapsed time = 31 ms. */