Monday, March 19, 2012

An issue with Full Text Search against PDF blobs...

My client is trying to use SQL Server Full Text Search to search PDFs stored
in a varbinary(max) column. Full text search works fine for other Microsoft
Office documents stored in the same table but it does NOT give back any
results from the PDF documents.
Following is a high-level view of what the client did:
1. Installed Adobe PDF IFilter 6.0
2. Ran the stored procedure sp_fulltext_service (as documented)
3. Restarted the server
4. Verified that the filter got properly installed by querying the system
view
sys.fulltext_document_types
5. Created a full text index on the table with the documents
6. Started a full population of the index
7. Ran a sample query with a string he knows is in the PDF file like the
following:
select * from documents where freetext(document, ‘Review’) and got no
results
back
8. Ran the same sample query with a string he knows is in some Word files
like the
following: select * from documents where freetext(document, ‘SQL’) the
query
returned several rows back as expected.
Does anybody know what might be happening here?
Thank you!
Camilo Leon
Camilo,
Are you using 64-bit Windows and SQL Server? If so, last time I looked the
Adobe PDF IFilter was only 32-bit.
RLF
"Camilo" <Camilo@.discussions.microsoft.com> wrote in message
news:92BEFB18-89BF-4C7E-8B11-769B0C329B7B@.microsoft.com...
> My client is trying to use SQL Server Full Text Search to search PDFs
> stored
> in a varbinary(max) column. Full text search works fine for other
> Microsoft
> Office documents stored in the same table but it does NOT give back any
> results from the PDF documents.
> Following is a high-level view of what the client did:
> 1. Installed Adobe PDF IFilter 6.0
> 2. Ran the stored procedure sp_fulltext_service (as documented)
> 3. Restarted the server
> 4. Verified that the filter got properly installed by querying the system
> view
> sys.fulltext_document_types
> 5. Created a full text index on the table with the documents
> 6. Started a full population of the index
> 7. Ran a sample query with a string he knows is in the PDF file like the
> following:
> select * from documents where freetext(document, 'Review') and got no
> results
> back
> 8. Ran the same sample query with a string he knows is in some Word files
> like the
> following: select * from documents where freetext(document, 'SQL') the
> query
> returned several rows back as expected.
> Does anybody know what might be happening here?
> Thank you!
> Camilo Leon
>
|||Russell,
No, we are using a 32-bit machine in this case.
I did make FTS work with PDF files by creating a new table, importing the
data, creating a new catalog and populating it again.
Thanks!
Camilo
"Russell Fields" wrote:

> Camilo,
> Are you using 64-bit Windows and SQL Server? If so, last time I looked the
> Adobe PDF IFilter was only 32-bit.
> RLF
> "Camilo" <Camilo@.discussions.microsoft.com> wrote in message
> news:92BEFB18-89BF-4C7E-8B11-769B0C329B7B@.microsoft.com...
>
>
|||Check the gatherer logs to see what the status of your population was.
relevantNoise - dedicated to mining blogs for business intelligence.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Camilo" <Camilo@.discussions.microsoft.com> wrote in message
news:92BEFB18-89BF-4C7E-8B11-769B0C329B7B@.microsoft.com...
> My client is trying to use SQL Server Full Text Search to search PDFs
> stored
> in a varbinary(max) column. Full text search works fine for other
> Microsoft
> Office documents stored in the same table but it does NOT give back any
> results from the PDF documents.
> Following is a high-level view of what the client did:
> 1. Installed Adobe PDF IFilter 6.0
> 2. Ran the stored procedure sp_fulltext_service (as documented)
> 3. Restarted the server
> 4. Verified that the filter got properly installed by querying the system
> view
> sys.fulltext_document_types
> 5. Created a full text index on the table with the documents
> 6. Started a full population of the index
> 7. Ran a sample query with a string he knows is in the PDF file like the
> following:
> select * from documents where freetext(document, 'Review') and got no
> results
> back
> 8. Ran the same sample query with a string he knows is in some Word files
> like the
> following: select * from documents where freetext(document, 'SQL') the
> query
> returned several rows back as expected.
> Does anybody know what might be happening here?
> Thank you!
> Camilo Leon
>

No comments:

Post a Comment