This is a second post in a series I have planned about the language found throughout your search log – all the way into the “long tail” and how it might or might not be feasible to understand it all.
My previous post, “80-20: The lie in your search log?“, highlighted how the slope of “short head” of your search terms may not be as steep as anecdotes would say. That is, there can be a lot less commonality within a particular time range among even the most common terms in your search log than you might expect.
After writing that post, I began to wonder about the overall re-use of terms over periods of time.
In other words:
Even while commonality of re-using terms within a month is relatively low, how much commonality do we see in our users’ language (i.e., search terms) from month to month?
To answer this, I needed to take the entire set of terms for a month and compare them with the entire set from the next month and determine the overlap and then compare the second month’s set of terms to a third month’s, and so on. Logically not a hard problem but quite a challenge in practice due to the volume of data I was manipulating (large only in the face of the tools I have to manipulate it).
So I pulled together every single term used over a period of about 18 months and broke them into the set used for each of those months and performed the comparison.
Before getting into the details, a few details to share for context about the search solution I’m writing about here:
My expectation was that comparing the entire set of terms from one month to the next would show a relatively high percentage of overlap. What I found was not what I expected.
If you look at the unique terms and their overlap, surprisingly, the average overlap between months was a shockingly low 13.2%. In other words, over 86% of the terms in any given month were not used at all in the
If you look at the total searches performed and the percent of searches performed with terms from the prior month, this goes up to an average of 36.2% – reflecting that the terms that are re-used in a subsequent month among the most common terms overall.
As you can see, the amount of commonality from month-to-month among the terms used is very low.
What can you draw from this observation?
In a brief discussion about this with noted search analytics expert Lou Rosenfeld, his reaction was that this represented a significant amount of change in the information needs of the users of the system – significant enough to be surprising.
Another conclusion I draw from this is that it provides another reason why it is very hard to meaningfully improve search across the language of your users. Based on my previous post on the flatness of the curve of term use within a month, we know that it we need to look at a pretty significant percentage of distinct terms each month to account for a decent percentage of all searches – 12% of distinct terms to account for only 50% of searches. In our search solution, that 12% doesn’t seem that large until you realize it is still represents about 6,000 distinct terms.
Coupling that with the observation from the analysis here means that even if you review those terms for a given month, you will likely need to review a significant percentage of brand new terms the next month, and so on. Not an easy task.
Having established just how challenging this can be, my next few posts will provide some ideas for grappling with the challenges.
In the meantime, if you have any insight on similar statistics from your solution (or statistics about the shape of the search log curve I previously wrote above), please feel free to share here, on the SearchCoP on Yahoo! groups or on the Enterprise Search Engine Professionals group on LinkedIn – I would very much like to compare numbers to see if we can identify meaningful generalizations from different solution.
It provides a great picture of the overall landscape of the problem (it’s not just search, after all!).
I especially liked slide 4 – a very telling illustration of the challenge we face in intelligently making information available to our users.
Re: Slide 24 – As I’ve written about before, I would say that the 80/20 rule is more than just “not quite accurate”. But that’s mincing words.
Overall, a highly recommended read.
Last week, I moderated a discussion for the weekly KMers.org Twitter chat about “The Importance of Search in your KM Solution”.
My intent was to try to get an understanding about how important search is relative to other components of a KM search (connecting people, collecting and managing content, etc.).
It was a good discussion with about a dozen or so people taking part (that I could tell).
You can read through the transcript of the session here. Let me know what you think on the topic!
During the discussion, a great question came up about measuring the success of your search solution (thanks to Ed Dale) which I thought deserved its own discussion, so I have submitted a suggestion for a new topic for an upcoming KMers.org chat.
Please visit the suggestion here and vote for it!
Recently, I have been trying to better understand the language in use by our users in the search solution we use, and in order to do that, I have been trying to determine what tools and techniques one might use to do that. This is the first post in a planned series about this effort.
I have many goals in pursuing this. The primary goal has been to be able to identify trends from the whole set of language in use by users (and not just the short head). This goals supports the underlying business desire of identifying content gaps or (more generally) where the variety of content available in certain categories does not match with the variety expected by users (i.e., how do we know when we need to target the creation and publication of specific content?)
Many approaches to this do focus on the short head – typically the top N terms, where N might be 50 or 100 or even 500 (some number that’s manageable). I am interested in identifying ways to understand the language through the whole long tail as well.
As I have dug into this, I realized an important aspect of this problem is to understand how much commonality there is to the language in use by users and also how much the language in use by users changes over time – and this question leads directly to the topic at hand here.
There is an anecdote I have heard many times about the short head of your search log that “80 percent of your searches are accounted for by the top 20% most commonly-used terms“. I now question this and wonder what others have seen.
I have worked closely with several different search solutions in my career and the three I have worked most closely with (and have most detailed insight on) do not come even close to the above assertion. Chart 1 shows the usage curve for one of these. The X axis is the percent of distinct terms (ordered by use) and the Y axis shows the percent of all searches accounted for by all terms up to X.
From this chart, you can see that it takes approximately 55% of distinct terms to account for 80% of all searches – that is a lot of terms!
This curve shows the usage for one month – I wondered about how similar this would be for other months and found (for this particular search solution) that the curves for every month were basically the exact same!
Wondering if this was an anomaly, I looked at a second search solution I have close access to to wonder if it might show signs of the “80/20″ rule. Chart 2 adds the curve for this second solution (it’s the blue curve – the higher of the two).
In this case, you will find that the curve is “higher” – it reaches 80% of searches at about 37% of distinct terms. However, it is still pretty far from the “80/20″ rule!
After looking at this data in more detail, I have realized why I have always been troubled at the idea of paying close attention to only the so-called “short head” – doing so leaves out an incredible amount of data!
In trying to understand the details of why, even though neither is close to adhering to the “80/20″ rule, the usage curves are so different, I realize that there are some important distinctions between the two search solutions:
I’m not sure how (or really if) these factor into the shape of these curves.
In understanding this a bit better, I hypothesize two things: 1) the shape of this curve is stable over time for any given search solution, and 2) the shape of this curve tells you something important about how you can manage your search solution. I am planning to dig more to answer hypothesis #1.
Questions for you:
I will be writing more on these search term usage curves in my next post as I dig more into the time-stability of these curves.
My first post back after too-long a period of time off. I wanted to jump back in and share some concrete thoughts on best bet governance.
I’ve previously written about best bets and how I thought, while not perfect, they were an important part of a search solution. In that post, I also described the process we had adopted for managing best bets, which was a relatively indirect means supported by the search engine we used for the search solution.
Since moving employers, I now have responsibility for a local search solution as well as input on an enterprise search solution where neither of the search engines supports a similar model. Instead, both support the (more typical?) model where you identify particular search terms that you feel need to have a best bet and you then need to identify a specific target (perhaps multiple targets) for those search terms.
This model offers some advantages such as specificity in the results and the ability to actively determine what search terms have a best bet that will show.
This model also offers some disadvantages, the primary one (in my mind) being that they must be managed – you must have a means to identify which terms should have best bets and which targets those terms should show as a best bet. This implies some kind of manual management, which, in resource-constrained environments, can be a challenge. As noted in my previous article, others have provided insight about how they have implemented and how they manage best bets.
Now having responsibility for a search solution requiring manual management of best bets, we’ve faced the same questions of governance and management and I thought I would share the governance model we’ve adopted. I did review many of the previous writings on this to help shape these, so thanks to those who have written before on the topic!
Our governance model is largely based on trying to provide a framework for consistency and usability of our best bets. We need some way to ensure we do not spend inordinate time on managing requests while also ensuring that we can identify new, valuable search terms and targets for best bets.
Without further ado, here is an overview of the governance we are using:
The one interesting experience we’ve had so far with this governance model is that we get a lot of push back from site publishers who want to provide a lengthy laundry list of terms for their site, even when 75% of that list is never used (or at least in a twelve month period we’ll sometimes check). They seem convinced that there is value in setting up best bets for terms even when you can show that there is none. We are currently making changes in the way we manage best bets and also in how we can use these desirable terms to enhance the organic results directly. More on that later.
There you have our current governance model. Not too fancy or complicated and still not ideal, but it’s working for us and we recognize that it’s a work in progress.
Now that I have the “monkey off my back” in terms of getting a new post published, I plan to re-start regular writing. Check back soon for more on search, content management and taxonomy!
Last summer, I read the article by Kas Thomas from CMS Watch titled “Best Bets – a Worst Practice” with some interest. I found his thesis to be provocative and posted a note to the SearchCoP community asking for other’s insights on the use of Best Bets. I received a number of responses taking some issue with Kas’ concept of what best bets is and some also some responses describing different means to manage best bets (hopefully without requiring the “serious amounts of human intervention” described by Kas.
In this post, I’ll provide a summary of sorts and also describe some of the ways described for managing best bets and also the way we have managed best bets.
Kas’ thesis is that best bets are not a good practice because they are largely a hack layered on top of a search engine and require significant manual intervention. Further, if your search engine isn’t already providing access to appropriate “best bets” for queries, you should get yourself a new search engine.
Some of the most interesting comments from the thread of discussion on the SearchCoP include (I’ll try to provide as cohesive picture of sentiment as I can but will only provide parts of the discussion – if I have portrayed intent incorrectly – that’s my fault and not the original author):
From Tim W:
“Search analytics are not used to determine BB … BB are links commonly used, enterprise resources that the search engine may not always rank highly because for a number of reasons. For example, lack of metadata, lack of links to the resource and content that does not reflect how people might look for the document. Perhaps it is an application and not a document at all.”
From Walter U:
“…manual Best Bets are expensive and error-prone. I consider them a last resort.”
From Jon T:
“Best Bets are not just about pushing certain results to the top. It is also about providing confidence in the results to users.
If you separate out Best Bets from the automatic results, it will show a user that these have been manually singled out as great content – a sign that some quality review has been applied.”
From Avi R:
“Best Bets can be hard to manage, because they require resources.
If no one keeps checking on them, they become stale, full of old content and bad links.
Best Bets are also incredibly useful.
They’re good for linking to content that can’t be indexed, and may even be on another site entirely. They’re good for dealing with … all the sorts of things that are obvious to humans but don’t fit the search paradigm.”
So, lots of differing opinions on best bets and their utility, I guess.
A few more pieces of background for you to consider: Walter U has posted on his blog (Most Casual Observer) a great piece titled “Good to Great Search” that discusses best bets (among other things); and, Dennis Deacon posted an article titled, “Enterprise Search Engine Best Bets – Pros & Cons” (which was also referenced in Kas Thomas’ post). Good reading on both – go take a look at them!
My own opinion – I believe that best bets are an important piece of search and agree with Jon T’s comment above that their presence (and, hopefully, quality!) give users some confidence that there is some human intelligence going into the presentation of the search results as a whole. I also have to agree with Kas’s argument that search engines should be able to consistently place the “right” item at the top of results, but I do not believe any search engine is really able to today – there are still many issues to deal with (see details in my posts on coverage, identity, and relevance for my own insights on some of the major issues).
That being said, I also agree that you need to manage best bets in a way that does not cost your organization more than their value – or to manage them in a way that the value is realized in multiple ways.
Contrary to what Tim W says, and as I have written about in my posts on search analytics (especially in the use of search results usage), I do believe you can use search analytics to inform your best bets but they do not provide a complete solution by any means.
From here on out, I’ll describe some of the ways best bets can be managed – the first few will be summary of what people shared on the SearchCoP community and then I’ll provide some more detail on how we have managed them. The emphasis (bolding) is my own to highlight some of what I think are important points of differentiation.
From Tim W:
“We have a company Intranet index; kind of a phone book for web sites (A B C D…Z). It’s been around for a long time. If you want your web site listed in the company index, it must be registered in our “Content Tracker” application. Basically, the Content Tracker allows content owners to register their web site name, URL, add a description, metadata and an expiration date. This simple database table drives the Intranet index. The content owner must update their record once per year or it expires out of the index.
This database was never intended for Enterprise Search but it has proven to be a great source for Best Bets. We point our ODBC Database Fetch (Autonomy crawler) at the SQL database for the Content Tracker and we got instant, user-driven, high quality Best Bets.
Instead of managing 150+ Best Bets myself, we now have around 800 user-managed Best Bets. They expire out of the search engine if the content owner doesn’t update their record once per year. It has proven very effective for web content. In effect, we’ve turned over management of Best Bets to the collective wisdom of the employees.”
From Jim S:
“We have added an enterprise/business group best bet key word/phrase meta data.
All documents that are best bet are hosted through our WCM and have a keyword meta tag added to indicate they are a best bet. This list is limited and managed through a steering team and search administrator. We primarily only do best bets for popular searches. Employee can suggest a best bet – both the term and the associated link(s). It is collaborative/wiki like but still moderated and in the end approved or rejected by a team. There is probably less than 1 best bet suggestion a month.
If a document is removed or deleted the meta data tag also is removed and the best bet disappears automatically.
Our WCM also has a required review date for all content. The date is adjustable so that content will be deactivated at a specific date if the date is not extended. This is great for posting information that has a short life as well as requiring content owners to interact with the content at least every 30 Months (maximum) to verify that the content is still relevant to the audience. The Content is not removed from the system, rather it’s deactivated (unpublished) so it no longer accessible and the dynamic links and search index automatically remove the invalid references. The content owner can reactivate it by setting the review date into the future.
If an external link (not one in our WCM) is classified as a best bet then a WCM redirect page is created that stores the best bet meta tag. Of course it has a review/expiration so the link doesn’t go on forever and our link testing can flag if the link is no longer responding. If the document is in the DMS it would rarely be deleted. In normal cases it would be archived and a archive note would be placed to indicate the change. Thus no broken links.
Good content engineering on the front end will help automate the maintenance on the back end to keep the quality in search high.“
The first process is external to the content and doesn’t require modifying the content (assuming I’m understanding Tim’s description correctly). There are obvious pros and cons to this approach.
By contrast, the second process embeds the “best bet” attribution in the content (perhaps more accurately in the content management system around the content) and also embeds the content in a larger management process – again, some obvious pros and cons to the approach.
Now for a description of our process -The process and tools in place in our solution are similar to the description provided by Tim W. I spoke about this topic at the Enterprise Search Summit West in November 2007, so you might be able to find the presentation for it there (though I could not just now in a few minutes of searching).
With the search engine we use, the results displayed in best bets are actually just a secondary search performed when a user performs any search – the engine searches the standard corpus (whatever context the user has chosen, which would normally default to “everything”) and separately searches a specific index that include all content that is a potential best bet.
The top 5 (a number that’s configurable) results that match the user’s search from the best bets index are displayed above the regular results and are designated “best bets”.
How do items get into the best bets index, then? Similar to what Tim W describes, on our intranet, we have an “A-Z index” – in our case, it’s a web page that provides a list of all of the resources that have been identified as “important” at some point in the past by a user. (The A-Z index does provide category pages that provide subsets of links, but the main A-Z index includes all items so the sub-pages are not really relevant here.)
So the simple answer to, “How do items get into the best bets index?” is, “They are added to the A-Z index!” The longer answer is that users (any user) can request an item be added to the A-Z index and there is then a simple review process to get it into the A-Z index. We have defined some specific criteria for entries added to the A-Z, several of which are related to ensuring quality search results for the new item, so when a request is submitted, it is reviewed against these criteria and only added if it meets all of the criteria. Typically, findability is not something considered by the submitter, so there will be a cycle with the submitter to improve the findability of the item being added (normally, this would include improving the title of the item, adding keywords and a good description).
Once an item is added to the A-Z index, it is a potential best bet. The search engine indexes the items in the A-Z through a web crawler that is configured to start with the A-Z index page and goes just one link away from that (i.e., it only indexes items directly linked to from the A-Z index).
In this process, there is no way to directly map specific searches (keywords) to specific results showing up in best bets. The best bets will show up in the results for a given search based on normally calculated relevance for the search. However, the best bet population numbers only about 800 items instead of the roughly half million items that might show up in the regular results – as long as the targets in the A-Z index have good titles and are tagged with the proper keywords and description, they will normally show up in best bets results for those words.
Some advantages of this approach:
Some disadvantages of this approach
Having written about what I consider to be the principles of enterprise search, about people search in the enterprise, about search analytics and several other topics related to search in some detail, I thought I would share some insights on a role I have called search analyst – the person(s) who are responsible for the care and feeding of an enterprise search solution. The purpose of this post is to share some thoughts and experiences and help others who might be facing a problem similar to what my team faced several years back – we had a search solution in place that no one was maintaining and we needed to figure out what to do to improve it.
Regarding the name of the role – when this role first came into being in my company, I did not know what to call the role, exactly, but we started using the term search analyst because it related to the domain (search) and reflected the fact that the role was detailed (analytical) but was not a technical job like a developer. Subsequently, I’ve heard the term used by others so it seems to be fairly common terminology now – it’s possible that by now I’ve muddled the timeline enough in my head that I had heard the term prior to using it but just don’t recall that!
What does a search analyst do for you? The short answer is that a search analyst is the point person for improving the quality of results in your search solution. The longer answer is that a search analyst needs to:
In order to define success for a search analyst, you need to set some specific objectives for the search analyst(s). Ultimately, given the job description, they translate to measuring how the search analyst has been successful in improving search, but here are some specific suggestions about how you might measure that:
Another common question I’ve received is what percentage of time should a search analyst expect to spend on this type of work? Some organizations may have large enough search needs to warrant multiple full-time people on this task but we are not such an organization and I suspect many other organizations will be in the same situation. So you might have someone who splits their time among several roles and this is just one of them.
I don’t have a full answer to the question because, ultimately, it will depend on the value your organization does place on search. My experience has been that in an organization of approximately 5-6,000 users (employees) covering a total corpus of about a million items spread across several dozen sites / applications / repositories, spending about .25 FTE on search analyst tasks seems to provide for steady improvements and progress.
Spending less than that (down to about .1 FTE), I’ve found, results in a “steady state” – no real improvements but at least the solution does not seem to degrade. Obviously, spending more than that could result in better improvements but I find that dependence on others (content owners, application owners, etc.) can be a limiting factor in effectiveness – full organizational support for the efforts of the search analyst (giving the search analyst a voice in prioritization of work) can help alleviate that. (A search analyst with a software development background may find this less of an issue as, depending on your organization, you may find yourself less tied to development resources than you would otherwise be, though this also likely raises your own FTE commitment.)
The above description is worded as if your organization has a single person focused on search analyst responsibilities. It might also be useful to spread the responsibility among multiple people. One reason would be if your enterprise’s search solution is large enough to warrant a team of people instead of a single person. A second would be that it can be useful to have different search analysts focused (perhaps part time still for each of them) on different content areas. In this second situation, you will want to be careful about how “territorial” search analysts are, especially in the face of significant new content sources (you want to ensure that someone takes on whatever responsibility there might be for that content in regards to ensuring good findability).
So far I’ve provided a description of the role of a search analyst, suggestions for objectives you can assign to a search analyst and at least an idea of the time commitment you might expect to have an effective search analyst. But, if you were looking to staff such a position, what kinds of skills should you look for? Here are my thoughts:
If your search needs warrant more than one person focused on improving your enterprise search solution, as much overlap in the above as feasible is good, though you may have team members specializing in some skills while others focus on other areas.
Another important issue to address is where in your overall organization should the search analyst responsibility rest? I don’t have a good answer for this question and am interested in others’ opinions. My own experiences:
Enough about my own insights – What does anyone else have to share about how you perceive this role? Where does it fit in your organization? What are your objectives for this role?
In my previous two posts, I’ve written about some basic search analytics and then some more advanced analysis you can also apply. In this post, I’ll write about the types of analysis you can and should be doing on data captured about the usage of search results from your search solution. This is largely a topic that could be in the “advanced” analytics topic but for our search solution, it is not built into the search solution and has been implemented only in the last year through some custom work, so it feels different enough (to me) and also has enough details within it that I decided to break it out.
When I first started working on our search solution and dug into the reports and data we had available about search behavior, I found we had things like:
and much more. However, I was frustrated by this because it did not give me a very complete picture. We could see the searches people were using – at least the top searches – but we could not get any indication of “success” or what people found useful in search, even. The closest we got from the reports was the last item listed above, which in a typical report might look something like:
Search Results Pages
However, all this really reflects is the percentage of each page number visited by a searcher – so 95% of users never go beyond page 1 and the engine assumes that means they found what they wanted there. That’s a very bad assumption, obviously.
I wanted to be able to understand what people were actually clicking on (if anything) when they performed a search! I ended up solving this with a very simple solution (simple once I thought of it). I believe this emulates what Google (and probably many other search engines) do. I built a simple servlet that takes a number of parameters, including a URL (encoded) and the various pieces of data about a search result target and stores an event in a database from those parameters and then forwards the user to the desired URL. Then the search results page was updated to provide the URL for that servlet in the search results instead of the direct URL to the target. That’s been in place for a while now and the data is extremely useful!
By way of explanation, the following are the data elements being captured for each “click” on a search result:
This data provides for a lot of insight on behavior. You can guess what someone might be looking for based on understanding the searches they are performing but you can come a lot closer to understanding what they’re really looking for by understanding what they actually accessed. Of course, it’s important to remember that this does not really necessarily equate to the user finding what they are looking for, but may only indicate which result looks most attractive to them, so there is still some uncertainty in understand this.
While I ended up having to do some custom development to achieve this, some search engines will capture this type of data, so you might have access to all of this without any special effort on your part!
Given the type of data described above, here are some of the questions and actions you can take as a search analyst:
You can also combine data from this source with data from your web analytics solution to do some additional analysis. If you capture the search usage data in your web analytics tool (as I mention above should be possible), doing this type of analysis should be much easier, too!
Here’s a wrap (for now) on the types of actionable metrics you might consider for your search program. I’ve covered some basic metrics that just about any search engine should be able to support; then some more complex metrics (requiring combining data from other sources or some kind of processing on the data used for the basic metrics) and in this post, I’ve covered some data and analysis that provides a more comprehensive picture of the overall flow of a user through your search solution.
There are a lot more interesting questions I’ve come up with in the time I’ve had access to the data described above and also with the data that I discussed in my previous two posts, but many of them seem a bit academic and I have not been able to identify possible actions to take based on the insights from them.
Please share your thoughts or, if you would, point me to any other resources you might know of in this area!
In my last post, I provided a description of some basic metrics you might want to look into using for your search solution (assuming you’re not already). In this post, I’ll describe a few more metrics that may take a bit more effort to pull together (depending on your search engine).
First up – there is quite a lot of insight to be gained from combining your search analytics data with your web analytics data. It is even possible to capture almost all of your search analytics in your web analytics solution which makes this combination easier, though that can take work. For your external site, it’s also very likely that your web analytics solution will provide insight on the searches that lead people to your site.
A first useful piece of analysis you can perform is to review your top N searches, perform the same searches yourself and review the resulting top target’s usage as reported in your web analytics tool.
A second step would be to review your web analytics report for the most highly used content on your site. For the most highly utilized targets, determine what are the obvious searches that should expose those targets and then try those searches out and see where the highly used targets fall in the results.
Another fruitful area to explore is to consider what people actually use from search results after they’ve done a search (do they click on the first item, second? what is the most common target for a given keyword? Etc.). I’ll post about this separately.
I’m sure there are other areas that could be explored here – please share if you have some ideas.
When I first got involved in supporting a search solution, I spent some time understanding the reports I got from my search engine. We had our engine configured to provide reports on a weekly basis and the reports provided the top 100 searches for the week. All very interesting and as we started out, we tried to understand (given limited time to invest) how best to use the insight from just these 100 searches each week.
We quickly realized that there was no really good, sustainable answer and this was compounded by the fact that the engine reported two searches as different searches if there was *any* difference between two searches (even something as simple as case difference, even though the engine itself does not consider case when doing a search – go figure).
In order to see the forest for the trees, we decided what would be desirable is to categorize the searches – associate individual searches with a larger grouping that allows us to focus at a higher level. The question was how best to do this?
Soon after trying to work out how to do this, I attended Enterprise Search Summit West 2007 and attended a session titled “Taxonomize Your Search Logs” by Marilyn Chartrand from Kaiser Permanente. She spoke about exactly this topic, and, more specifically, the value of doing this as a way to understand search behavior better, to be able to talk to stakeholders in ways that make more sense to them, and more.
Marilyn’s approach was to have a database (she showed it to me and I think it was actually in a taxonomy tool but I don’t recall the details – sorry!) where she maintained a mapping from individual search terms to the taxonomy values.
After that, I’ve started working on the same type of structure and have made good headway. Further, I’ve also managed to have a way to capture every single search (not just the top N) into a SQL database so that it’s possible to view the “long tail” and categorize that as well. I still don’t have a good automated solution to anything like auto-categorizing the terms but the level of re-use from one reporting period to the next is high enough that dumping in a new period’s data requires categorization of only part of the new data. [Updated 26 Jan 2009 to add the following] Part of the challenge is that you will likely want to apply many of the same textual conversions to your database of captured searches that are applied by your search engine – synonyms, stemming, lemmatization, etc. These conversions can help simplify the categorization of the captured searches.
Anyway – the types of questions this enables you to answer and why it can be useful include:
Another useful type of analysis you can perform on search data is to look at simple metrics of the searches. Louis Rosenfeld identified several of these – I’m including those here and a few additional thoughts.
Another interesting view of your search data is hinted at by the discussion above of “secondary” search words – words that are used in conjunction with other words. I have not yet managed to complete this view (lack of time and, frankly, the volume of data is a bit daunting with the tools I’ve tried).
The idea is to parse your searches into their constituent words and then build a network between the words, where the each word is a node and the links between the words represent the strength of the connection between the words – where “strength” is the number of times those two words appear in the same searches.
Having this available as a visual tool to explore words in search seems like it would be valuable as a way to understand their relationships and could give good insight on the overall information needs of your searchers.
The cost (in myown time if nothing else) of taking the data and manipulating it into a format that could then be exposed in this, however, has been high enough to keep me from doing it without some more concrete ideas for what actionable steps I could take from the insight gained. I’m just not confident enough to think that this would expose anything much more than “the most common words used tend to be used together most commonly”.
I’m missing a lot of interesting additional types of analyses above – feel free to share your thoughts and ideas.
In my next post, I’ll explore in some more detail the insights to be gained from analyzing what people are using in search results (not just what people are searching for).
In my first few posts (about a year ago now), I covered what I call the three principles of enterprise search – coverage, identity, and relevance. I have posted on enterprise search topics a few times in the meantime and wanted to return to the topic with some thoughts to share on search analytics and provide some ideas for actionable metrics related to search.
I’m planning 3 posts in this series – this first one will cover some of what I think of as the “basic” metrics, a second post on some more advanced ideas and a third post focusing more on metrics related to the usage of search results (instead of just the searching behavior itself).
Before getting into the details, I also wanted to say that I’ve found a lot of inspiration from the writings and speaking of Louis Rosenfeld and also Avi Rappoport and strongly recommend you look into their writings. A specific webinar to share with you, provided by Louis, is “Site Search Analytics for a Better User Experience“, which Louis presented in a Search CoP webcast last spring. Good stuff!
Now onto some basic metrics I’ve found useful. Most of these are pretty obvious, but I guess it’s good to start at the start.
That’s all of the topics I have for “basic metrics”. Next up, some ideas (along with actions to take from them) on more complex search metrics. Hopefully, you find my recommendations for specific actions you can take on each metric useful (as they do tend to make the posts longer, I realize!).