In my last post, I wondered about the lack of meaningful standards for evaluating enterprise search implementations.
I did get some excellent comments on the post and also some very useful commentary from a LinkedIn discussion about this topic – I would recommend you read through that discussion. Udo Kruschwitz and Charlie Hull both provided links to some very good resources.
In this post, I thought I would describe what I think to be some important attributes of any standard measures that could be adopted. Here I will be addressing the specific actions to measure – in a subsequent post I will write about how these can be used to actually evaluate a solution.
Measurable
To state the obvious, we need to have metrics that are measurable and objective. Ideally, metrics that directly reflect user interaction with the search solution.
Measures that depend on subjective evaluation or get feedback from users through means other than their direct use of the tool can be very useful but introduce problems in terms of interpretation differences and sustainability.
For example, a feedback function built into the interface (“Are these results useful?” or even a more specific, “Is this specific result useful for you here?”) can provide excellent insight but are used so little that the data is not useful overall.
Surveys of users inevitably fall into the problem of faulty or biased memory – in my experience, users have such a negative perception of enterprise search that individual negative experiences will overwhelm positive experiences with the search when you ask them to recall and assess their experience a day or week after their usage.
Common / Useful to compare implementations
Another important consideration is that a standard for evaluating enterprise search should include aspects of search that are common across the broad variety of solutions you might see.
In addition, they should lend themselves to comparing different solutions in a useful way.
Some implementations might be web-based (in my experience, this is by far the most common way to make enterprise search available). Some might be based on a desktop application or mobile app. Some implementations might depend only on users enterprise search terms to start a search session; some might only support searching based on search terms (no filtering or refining at all). Some implementations might provide a “search as you type” (showing results immediately based on part of what the user has entered). Many variations to consider here.
I would want to have measures that allow me to compare one solution to another – “Is this one better than that one?” “Are there specific user needs where this solution is better than that one?”
Likely to be insightful
Another obvious aspect is that we want to include measures that are likely to be useful.
Useful in what way, though?
My first thought is that it must measure if the solution is useful for the users – does it meet the users’ needs? (With search, I would simplify this to “does it provide the information the user needs efficiently?” but there are likely a lot of other ways to define “useful” even within a search experience.
Operationalizable
I would want all measures I use to be consistently available (no need to “take a measurement” at a given time) and also to not depend on someone actively having to “take a measurement”.
As mentioned above, measures that directly reflect what happens in the user experience are what I would be looking for. In this case, I would add in that the measures should be taken directly from the user experience – data captured into a search log file somewhere or captured via some other means.
This provides a data set that can be reviewed and used at basically any time and which (other than maintaining the system capturing the measurements) don’t require any effort to capture and maintain – the users use the search solution and their activities are captured.
Usable for overall and when broken down by dimensions
Finally, I would expect that measures would support analysis at broad scales and also should support the ability to drill in to details and use the same measures?
Examples of “broad scale” applicability: How good is this search solution overall? How good is my search solution in comparison to the overall industry average? How good are search solutions supporting the needs of users in the XYZ industry? How good are search solutions at supporting “known item” searching in comparison with “exploratory searching”?
Examples of drilling in: Within my user base, how successful are my users by department? How useful is the search solution in different topic areas of content? How good are results for individual, specific search criteria?
Others?
I’m sure I am missing a lot of potential criteria here – What would you add? Remove? Edit?
Over the past several years of working very closely with the enterprise search solution at Deloitte, I have tried to look “outside” as best as I can in order to understand what others in the industry are doing to evaluate their solutions in order to understand where ours ‘fits’.
I’ve attended a number of conferences and webcasts and read papers (many, I’ll admit, that are highlighted by Martin White on Twitter. I can’t recommend a follow of Martin enough!)
One thing I have never found is any common way to evaluate or talk about enterprise search solutions. I have seen several people (including Martin) comment on the relatively little research on enterprise search (as opposed to internet search, which has a lot of research behind it), and I am sure a significant reason for that is that there is no common way to evaluate the solutions.
If we could compare in a systematic way, we could start to understand how to do things like:
Why do we not have a common set of definitions?
One possibility is certainly that I have still not read up enough on the topic – perhaps there is a common set of definitions – if so, feel free to share.
Another possibility is that this is a result of dependency on the metrics that are implemented within the search solutions enterprises are using. I have found that these are useful but they don’t come with a lot of detail or clarity of definition. And, more specifically, they don’t seem common across products. That said, I have relatively limited exposure to multiple search solutions – Again, I would be interested in insights from those who have (perhaps any consultants working in this space?)
And, one more possible driver behind a lack of commonality is the proprietary nature of most implementations. I try to speak externally as frequently as I can, but I am always hesitant (and have been coached) to not be too detailed on the implementation.
I do plan to put up a small series here, though, with some of the more elemental components of our metrics implementation for comparison with anyone who cares to share.
More soon!