In my last post, I provided a description of some basic metrics you might want to look into using for your search solution (assuming you’re not already). In this post, I’ll describe a few more metrics that may take a bit more effort to pull together (depending on your search engine).
Combining Search Analytics and Web Analytics
First up – there is quite a lot of insight to be gained from combining your search analytics data with your web analytics data. It is even possible to capture almost all of your search analytics in your web analytics solution which makes this combination easier, though that can take work. For your external site, it’s also very likely that your web analytics solution will provide insight on the searches that lead people to your site.
A first useful piece of analysis you can perform is to review your top N searches, perform the same searches yourself and review the resulting top target’s usage as reported in your web analytics tool.
- Are the top targets the most used content for that topic?
- Assuming you can manipulate relevancy at an individual target level, you might bump up the relevancy for items that are commonly used but which show below other items in the search results (or you might at least review the titles and tags for the more-commonly-used items and see if they can be improved).
- Are there targets you would expect to see for those top searches that your web analytics tool reports as highly utilized but which don’t even show in the search results for the searches? Perhaps you have a coverage issue and those targets are not even being indexed.
- It might be possible to integrate data from your web analytics solution reflecting usage directly into your search to provide a boost in relevance for items in search that reflects usage.
- [Update 26 Jan 2009] One item I forgot to include here originally is to use your web analytics tool to track the page someone is on when they perform a search (assuming you provide persistently available access to your search tool – say in a persistently available search box on your site). Knowing this can help tune your navigational experience. Pages that commonly lead users to use search would seem like pages that do not provide good access to the information users expect and they fall back to using search. (Of course, it might be that leading the user to search is part of the point of the page so keep that in mind.)
- [Update 26 Jan 2009] Another metric to monitor – measure the ratio of searches performed each reporting period (week) to the number of visits for that same time period. This will give you a sense of how much the search is used (in relation to navigation). I find that the absolute number is not as useful as tracking this over time and that monitoring changes in this value can give you indicators of general issues with navigation (if the ratio goes up) or search (if the ratio goes down). Does anyone know of any benchmarks in this area? I do not but am interested in understand if there’s a generally-accepted range for this that is judged “acceptable”. In the case of our solution, when I first started tracking this, it was just under .2 and has seen a pretty steady increase over the years to a pretty steady value of about 0.33 now.
A second step would be to review your web analytics report for the most highly used content on your site. For the most highly utilized targets, determine what are the obvious searches that should expose those targets and then try those searches out and see where the highly used targets fall in the results.
- Do they show as good results? If not, ensure that the targets are actually included in your search and review the content, titles and tags. You might need to also tweak synonyms to ensure good coverage.
- You should also review the most highly used content as reported by your web analytics tool against your “best bets” (if you use that). Is the most popularly accessed content show up in best bets?
Another fruitful area to explore is to consider what people actually use from search results after they’ve done a search (do they click on the first item, second? what is the most common target for a given keyword? Etc.). I’ll post about this separately.
I’m sure there are other areas that could be explored here – please share if you have some ideas.
Categorizing your searches
When I first got involved in supporting a search solution, I spent some time understanding the reports I got from my search engine. We had our engine configured to provide reports on a weekly basis and the reports provided the top 100 searches for the week. All very interesting and as we started out, we tried to understand (given limited time to invest) how best to use the insight from just these 100 searches each week.
- Should we review the results from each of those 100 searches and try to make sure they looked good? That seemed like a very time intensive process.
- Should we define a cut off (say the top 20)? Should we define a cutoff in terms of usage (any search that was performed more than N times)?
- What if one of these top searches was repeated? How often should we re-review those?
- How to recognize when a new search has appeared that’s worth paying attention to?
We quickly realized that there was no really good, sustainable answer and this was compounded by the fact that the engine reported two searches as different searches if there was *any* difference between two searches (even something as simple as case difference, even though the engine itself does not consider case when doing a search – go figure).
In order to see the forest for the trees, we decided what would be desirable is to categorize the searches – associate individual searches with a larger grouping that allows us to focus at a higher level. The question was how best to do this?
Soon after trying to work out how to do this, I attended Enterprise Search Summit West 2007 and attended a session titled “Taxonomize Your Search Logs” by Marilyn Chartrand from Kaiser Permanente. She spoke about exactly this topic, and, more specifically, the value of doing this as a way to understand search behavior better, to be able to talk to stakeholders in ways that make more sense to them, and more.
Marilyn’s approach was to have a database (she showed it to me and I think it was actually in a taxonomy tool but I don’t recall the details – sorry!) where she maintained a mapping from individual search terms to the taxonomy values.
After that, I’ve started working on the same type of structure and have made good headway. Further, I’ve also managed to have a way to capture every single search (not just the top N) into a SQL database so that it’s possible to view the “long tail” and categorize that as well. I still don’t have a good automated solution to anything like auto-categorizing the terms but the level of re-use from one reporting period to the next is high enough that dumping in a new period’s data requires categorization of only part of the new data. [Updated 26 Jan 2009 to add the following] Part of the challenge is that you will likely want to apply many of the same textual conversions to your database of captured searches that are applied by your search engine – synonyms, stemming, lemmatization, etc. These conversions can help simplify the categorization of the captured searches.
Anyway – the types of questions this enables you to answer and why it can be useful include:
- What are the most-used categories of content for your search users?
- How does this correlate with usage (as reported in your web analytics solution) for that same category?
- If they don’t correlate well, you may have a navigational issue to address (perhaps raising the prominence of a category that’s overly visible in navigation or lowering it).
- Review the freshness of content in those categories and work with content owners to ensure that content is kept up to date. I’ve found it very useful to be able to talk with content owners in terms like “Did you know that searches for your content constitute 20% of all searches?” If nothing else, it helps them understand the value of their content and why they should care about how well it shows up in search results! Motivate them to keep it up to date!
- Assuming you categorize your searches based on your taxonomy, this can also feed back into your taxonomy management process as well! Perhaps you can identify taxonomic terms that should be retired or collapsed or split using insights from predominance of use in search.
- Within the categorization of search terms, can you correlate the words used to identify what are the most common “secondary” words in the searches. An example – GroupWise is a product made and sold by my employer. It is also a common search target. So a lot of searches will include the word groupwise in them (I use that as a way to pseudo-automatically categorizes searches with a category – by the presence of a single keyword). Most of those searches, though, include other words. What are the most common words (other than groupwise) among searches that are assigned to the GroupWise category?
- This insight can help you tune your navigation – common secondary words represent content that a user should have access to when they are looking at a main page (assuming one exists) for that particular category. If the most common secondary word for GroupWise were documentation, say, providing direct access to product documentation would be appropriate.
- You can also use that insight to feed back into your taxonomy (specifically, you might be able to find ways to identify new sub-terms in your taxonomy).
Analytics on the search terms / words
Another useful type of analysis you can perform on search data is to look at simple metrics of the searches. Louis Rosenfeld identified several of these – I’m including those here and a few additional thoughts.
- How many words, on average, are in a search? What is the standard deviation? This insight can help you understand how complex the searches your users are performing. I don’t know what a benchmark is, but I find in our search solution, it averages just over 2 words / search. This indicates to me that the average search is very simple, so expectations are high on the search engine’s ability to take those 2 words and provide a good result.
- You can also monitor this over time and try to understand if it changes much and, if so, analyze what has changed.
- While not directly actionable, another good view of this data is to build a chart of the # of searches performed for each count of words. The chart below shows this for a long period of use on our engine. You can see that searches with more than 10 words are vanishingly small. After the jump from 1 word to 2 words, it’s almost a steady decline, though there are some anomalies in the data where certain longer lengths jump up from the previous count (for example, 25 word searches are more than twice as common as 24 word searches). The absolute numbers of these is very small, though, so I don’t think it indicates much about those particular lengths.
Chart of Searches per Word Count
- You can also look at the absolute length of the search terms (effectively, the number of characters). This is useful to review against your search UI (primarily, the ever-present search box you have on your site, right?). Your search box should be large enough to ensure that a high percentage (90+%) of searches will be visible in the box without scrolling.
- I did this analysis and found that our search UI did exactly that.
- I also generated a chart like the one above where the X axis was the length of the search and found some obvious anomalies in our search – you can see them in the chart below.
- I tried to understand the unexpected spike in searches of length 3 and 4 compared to the more regular curve and found that it was caused by a high level of usage of (corporate-specific) acronyms in our search! This insight led me to realize that we needed to expand our synonyms in search to provide more coverage for those acronyms, which were commonly the acronyms for internal application names.
Chart of Search Length to number of searches
Network Analysis of Search Words
Another interesting view of your search data is hinted at by the discussion above of “secondary” search words – words that are used in conjunction with other words. I have not yet managed to complete this view (lack of time and, frankly, the volume of data is a bit daunting with the tools I’ve tried).
The idea is to parse your searches into their constituent words and then build a network between the words, where the each word is a node and the links between the words represent the strength of the connection between the words – where “strength” is the number of times those two words appear in the same searches.
Having this available as a visual tool to explore words in search seems like it would be valuable as a way to understand their relationships and could give good insight on the overall information needs of your searchers.
The cost (in myown time if nothing else) of taking the data and manipulating it into a format that could then be exposed in this, however, has been high enough to keep me from doing it without some more concrete ideas for what actionable steps I could take from the insight gained. I’m just not confident enough to think that this would expose anything much more than “the most common words used tend to be used together most commonly”.
I’m missing a lot of interesting additional types of analyses above – feel free to share your thoughts and ideas.
In my next post, I’ll explore in some more detail the insights to be gained from analyzing what people are using in search results (not just what people are searching for).