/ Umbraco

Examine: Indexing and searching with multipart properties

One of the features I am building into Gravyframe is news, and one of the features is that news items can be assigned to more than one category. The approach I have taken is to have an Examine (Lucene) index that knows about the news items and the categories they belong to, so a list can be quickly retrieved for a category.

I have been running into a few issues with Examine, but have been determined to find workarounds that are not complete hacks, and don’t cause the code to smell badly. One of these issues was, adding more than one field to a Lucene document with the same name. When I have worked with Lucene in the past this was very straight forward, but Examine seems to throw a spanner in the works for more complex scenarios. I could be completely out of the park here, and just did not read the documentation correctly.

The default functionality for indexing fields don’t cover a property on a node that is actually a multipart property, XPath CheckBoxList for example. At first to overcome this I removed the categories property from the “IndexUserFields” section in the “ExamineIndex.config” and tried to add the field in an “OnGatheringNodeData” overload. This was unsuccessful as I could not add more than one category, due to Examine using a dictionary of strings to add fields to the index.

From past experiences, to add more than one category you just added two fields with the same name to the Lucene document. After reading through the source for a while I was almost going to throw away Examine to use vanilla Lucene, when I found I had access to the Lucene document via the “OnDocumentWriting” overload. I added the two category Ids, everything built and I was able to index the site successfully.

The issue now was searching the index, once again Examine is using a dictionary of strings and could not add more than one category. At this point it looked like I needed to alter the search in some way as well which I did not wish to do at the time. As a last attempt I tried to concatenate the category Ids separating them by a pipe (|). The site was successfully indexed and searched worked correctly.

The code snippets are from a custom indexer that extends “BaseUmbracoIndexer”.

This solution is not 100% ideal but works nicely, although I think I will ultimately need to use vanilla Lucene.

EDIT

I have had a question regarding the example code in this post so wanted to clarify a few things.

The nodeFactoryFacade is not a default Umbraco type, it is a type I have created to help me mock out Umbraco and unit test my code. All it does is call "new Node(nodeId);" under the hood. I blogged about the use of Façades to help you unit test if you’re interested.

I don't include the multipart field in my Examine configuration as this code adds it directly to the index.