Lee Romero

On Content, Collaboration and Findability

What is a Search Analyst?

Tuesday, January 27th, 2009

Having written about what I consider to be the principles of enterprise search, about people search in the enterprise, about search analytics and several other topics related to search in some detail, I thought I would share some insights on a role I have called search analyst – the person(s) who are responsible for the care and feeding of an enterprise search solution. The purpose of this post is to share some thoughts and experiences and help others who might be facing a problem similar to what my team faced several years back – we had a search solution in place that no one was maintaining and we needed to figure out what to do to improve it.

Regarding the name of the role – when this role first came into being in my company, I did not know what to call the role, exactly, but we started using the term search analyst because it related to the domain (search) and reflected the fact that the role was detailed (analytical) but was not a technical job like a developer. Subsequently, I’ve heard the term used by others so it seems to be fairly common terminology now – it’s possible that by now I’ve muddled the timeline enough in my head that I had heard the term prior to using it but just don’t recall that!

What does a Search Analyst do?

What does a search analyst do for you? The short answer is that a search analyst is the point person for improving the quality of results in your search solution. The longer answer is that a search analyst needs to:

  • Review data related to your search solution and understand its implications
  • Formulate new questions to continually improve upon the insights gained from the data
  • Formulate action plans from insights gained from monitoring that data in order to continually improve your search solution – this requires that the search analyst understand your search solution at a deep enough level of understand to be able to translate analytic insights into specific changes or actions
  • Follow through on those action plans and work with others as necessary to effect the necessary changes

Measuring Success as a Search Analyst

In order to define success for a search analyst, you need to set some specific objectives for the search analyst(s). Ultimately, given the job description, they translate to measuring how the search analyst has been successful in improving search, but here are some specific suggestions about how you might measure that:

  • Execute a regular survey of users of your search (perhaps annually?) – this can be a very direct way of measuring increased quality, though ensuring you get good coverage of the target audience (and reflect appropriate demographics) may be a challenge. We have used this and results do reflect increases in satisfaction.
  • Provide ability to rate search results – a more direct way than a survey to measure satisfaction with search, though implementing it and integrating it with the search experience in a way that invites users to provide feedback can be a challenge.
  • Measure overall increase in search usage – No need to directly work with users of your search but also begs the question about whether increasing search usage is really a measure of quality.
  • Measure increase in search usage relative to visits to your site (assuming your search solution is integrated with your intranet, for example) – I mentioned this in post on advanced metrics as a metric to monitor. I think this can be more useful than just measuring increases in usage, however, it might also reflect changes (good or bad) in navigation as much as changes in search.
  • Measure overall coverage of search (total potential targets) – How much content does your search solution make available as potential search results? By itself, increases in this do not equate to an improvement in search but if combined with other metrics that more directly measure quality of results, increases in coverage do translate to a user being more likely to get what they need from search. In other words, if you can assure users that they can gain direct access to more potential results in search while also ensuring that the quality of results returned is at least as good as before, that’s a good thing. On the other hand, if adding in new content pollutes the experience with many less-relevant search results, you are not doing anyone any favors by including them.
  • Measure number of specific enhancements / changes made to improve the quality of results – especially for highly sought content. Assuming you track the specific changes made, a measure of effectiveness could be to track how many changes a search analyst has made over a given time period. Did the search analyst effect 5 changes in a month? 50? Again, the number itself doesn’t directly reflect improvements (some of those changes could have been deleterious to search quality) but it can be an indicator of value.

Time Commitment for a Search Analyst

Another common question I’ve received is what percentage of time should a search analyst expect to spend on this type of work? Some organizations may have large enough search needs to warrant multiple full-time people on this task but we are not such an organization and I suspect many other organizations will be in the same situation. So you might have someone who splits their time among several roles and this is just one of them.

I don’t have a full answer to the question because, ultimately, it will depend on the value your organization does place on search. My experience has been that in an organization of approximately 5-6,000 users (employees) covering a total corpus of about a million items spread across several dozen sites / applications / repositories, spending about .25 FTE on search analyst tasks seems to provide for steady improvements and progress.

Spending less than that (down to about .1 FTE), I’ve found, results in a “steady state” – no real improvements but at least the solution does not seem to degrade. Obviously, spending more than that could result in better improvements but I find that dependence on others (content owners, application owners, etc.) can be a limiting factor in effectiveness – full organizational support for the efforts of the search analyst (giving the search analyst a voice in prioritization of work) can help alleviate that. (A search analyst with a software development background may find this less of an issue as, depending on your organization, you may find yourself less tied to development resources than you would otherwise be, though this also likely raises your own FTE commitment.)

The above description is worded as if your organization has a single person focused on search analyst responsibilities. It might also be useful to spread the responsibility among multiple people. One reason would be if your enterprise’s search solution is large enough to warrant a team of people instead of a single person. A second would be that it can be useful to have different search analysts focused (perhaps part time still for each of them) on different content areas. In this second situation, you will want to be careful about how “territorial” search analysts are, especially in the face of significant new content sources (you want to ensure that someone takes on whatever responsibility there might be for that content in regards to ensuring good findability).

What Skills does a Search Analyst Need

So far I’ve provided a description of the role of a search analyst, suggestions for objectives you can assign to a search analyst and at least an idea of the time commitment you might expect to have an effective search analyst. But, if you were looking to staff such a position, what kinds of skills should you look for? Here are my thoughts:

  • First, I would expect that a search analyst is a capable business analyst. I would expect that anyone who I would consider a capable search analyst would be able to also work with business users to elicit, structure and document requirements in general. I would also expect a search analyst to be able to understand and document business processes in general. Some other insights on a business analyst’s skills can be found here and here.
  • I would also expect that a search analyst should be naturally curious and knows how to ask the right questions. Especially with regard to the exploratory nature of dealing with a lot of analytical data (as seen in my recent posts about search analytics).
  • A search analyst must be very capable of analyzing data sets. Specifically, I would expect a search analyst to be very proficient in using spreadsheets to view large data collections – filtering, sorting, formulae, pivot tables, etc. – in order to understand the data they’re looking at. Depending on your search solution, I would also expect a search analyst to be proficient with building SQL queries; ideally they would use reports built in a reporting system (and so not have to directly manipulate data using SQL) but I find that the ad hoc / exploratory nature of looking at data makes that hard.
  • I would expect a search analyst to have an understanding of taxonomy in general and, specifically, understands your organization’s taxonomy and its management processes. This is important because the taxonomy needs to be an input into their analysis of search data and also (as highlighted in the potential actions taken from insights from search analytics), many insights can be gained from a search analyst that can influence your taxonomy.
  • I would also look for a search analyst to understand information architecture and how it influences navigation on your organization’s web sites. As with the taxonomy, I find that the search analyst will often discover insights that can influence your navigation.
  • I would expect a search analyst to have some understanding in basic web technologies. Most especially HTML and the use of meta tags within it. Also, XML is important (perhaps moreso, depending on your search engine). Some understanding of JavaScript (at least in so far as how / if your engine deals with it) can be useful.
  • I would expect that a search analyst should be able to quickly learn details of computer systems – specifically, how to manage and administer your search solution. I would not be hung up on whether your search analyst already knows the specific engine you might be using but that can obviously be useful.
  • This is not a skill, but another important piece of knowledge your search analyst should have is a good understanding of your major content sources and content types. In general, what kinds of things should be expected to be found in what places? What formats? What kinds of processes are behind their maintenance?
  • This is also not a skill per se, but it is important for your search analyst to be connected to content managers and application teams. The connection might be relatively tight (working in a group with them) or loose (association via a community of practitioners in your organization). The reasons for this suggestion include:
    • The ability to easily have two way communication with content managers enables your search analyst to provide continuous education to content managers about the importance of their impact on findability (education about good content tagging, how content will show in search, etc.) and also enables content managers to reach out to a search analyst when they are trying to proactively improve search of their content (something which does not seem to be as likely as I’d like to see within an enterprise setting!).
    • The ability to communicate with development teams can help in similar ways: The search analyst can use that as a way to continually reinforce the need for developers to consider findability when applications are deployed. Also, connectivity with development teams can provide insights to the search analyst so that they can proactively inject themselves in the testing of the applications (or hopefully even in the requirements definition process!) to ensure findability is actually considered.
  • Given that last recommendation, it is also important that a search analyst be able to communicate effectively and also be comfortable in teaching others (formally or informally). I find that education of others about findability is a constant need for a search analyst.

If your search needs warrant more than one person focused on improving your enterprise search solution, as much overlap in the above as feasible is good, though you may have team members specializing in some skills while others focus on other areas.

Organizational location of search analyst

Another important issue to address is where in your overall organization should the search analyst responsibility rest? I don’t have a good answer for this question and am interested in others’ opinions. My own experiences:

  • Originally, we have this responsibility falling on the heads of our search engine engineers. Despite their best efforts, this was destined to not be effective because their focus was primarily on the engine and they didn’t have enough background in things like the content sources, applications or repositories to include, connectivity to content managers or application developers. They primarily just ensured that the engine was running and would make changes reactively when someone contacted them about an issue.
  • We moved this responsibility into our knowledge management group – I was a trigger for this move as I could see that no one else in the organization was going to “step up”.
  • Due to subsequent organizational changes, this responsibility then fell into the IT group.
  • At this point, I would suggest that the best fit in our organization was within the KM group.
    • A search analyst is not a technical resource (developer or system admin, for example) though the job is very similar to business analysts that your IT group might have on staff.
    • The real issue I have found with having this responsibility fall into the IT organization is that within many organizations, IT is an organization that is responsive to the business and not an organization that drives business processes or decisions. Much of what the search analyst needs to accomplish will result in IT driving its own priorities, which can present challenges – the voice of the search analyst is not listened to within IT because it’s not coming “from the business”.
    • Also, it can be a challenge for an IT group to position a search analyst within it in order to support success. The internal organization of IT groups will vary so widely I can’t make any specific suggestions here, but I do believe that if your search analyst is located within your IT group, a search analyst could be closely aligned to a group focused on either architecture or business intelligence and be successful.
  • If your organization is structured to have a specific group with primary responsibility for your web properties (internal or external), that group would also be a potential candidate for positioning this responsibility. If that group primarily focuses externally, you would likely find that a search analyst really plays more of an SEO role than being able to focus on your enterprise search solution.

Enough about my own insights – What does anyone else have to share about how you perceive this role?   Where does it fit in your organization?  What are your objectives for this role?

Categories of Search Requirements

Wednesday, October 1st, 2008

I was recently asked by a former co-worker (Ray Sims) for some suggestions around requirements that he might use as the basis for an evaluation of search engines. Having just gone through such an evaluation myself, and also having posted here about the general methodology I used for the evaluation, I thought I’d follow that up with some thoughts on requirements.

If you find yourself needing to evaluate a search engine, these might be of value – at least in giving you some areas to further detail.

I normally think of requirements for search in two very broad categories – those that are more basically about helping the user doing the search (End User Search Requirements) and those that are more directed at the people (person) who is responsible for administering / maintaining the search experience (Administrator Requirements).

End User Search Requirements

  • Search result page customization – Is it straightforward to provide your own UI on top of the search results (for integration into your web site)?
  • Search result integration with other UIs (outside of a web experience) – Specifically, it’s possible you might want to use search results in a non-web-based application – can the engine do that? (If you can provide result pages in different formats, a simple way to do this is to provide an XML result format that an application can pull in via a URL.)
  • Search result summaries for items – Specifically, these should be dynamic. The snippet shown in the results should show something relevant to what the searcher searched on – not just a static piece of text (like a metadata description field). This, by itself, can greatly enhance the perceived quality of results because it makes it easier for a user to make a determination on the quality of an item right from the search results – no need to look at the item (even a highlighted version of it).
  • Highlighting – it should be possible to see a highlighted version of a result (i.e., search terms are highlighted in the display of the document)
  • “Best Bets” (or key match or whatever) – Some don’t like these, but I think it’s important to have some ability to “hand pick” (or nearly hand pick) some results for some terms – also, I think it’s very desirable to be able to say “If a user searches on X, show this item as the top result” regardless of where that item would organically show in the result (or it might not even be really indexable)
  • Relevancy calculation “soundness” – This basically means that the engine generates a good measure of relevancy for searches and encompasses most of what differentiates engines. You should understand at a general level what effects the relevancy as computed by the engine. (For many search engines, this is a the “magic dust” they can bring to the table – so they may not be willing to expose too much about how they do this but you should ask.)
  • Stemming – The engine should support stemming – if a user searches on “run”, it should automatically match the use of words that share the same stem – “runs”, “running”, “ran”, etc.
  • Similar to stemming, the engine should support synonyms – if I search on “shoe”, it might be useful to include content that matches “boot” or “slipper”, etc.
  • Concept understanding (entity extraction) – Can the engine determine the entities in a piece of content even when the content is not explicitly defined? A piece of content might be about “Product X”, say, but it may never even explicitly mention “Product X”. Some search engines will claim to do this type of analysis.
  • Performance – Obviously, good performance is important and you should understand how it scales. Do you expect a few thousand searches a week? Tens of thousands? Hundreds of thousands? You need to understand your needs and ensure that the engine will meet them.
  • Customization of error / not found presentation – Can you define what happens when no results found or some type of system error happens – It can be useful to be able to define a specific behavior when an engine would otherwise return no results (a behavior that might be outside of the engine, specifically).
  • Related queries – It might be desirable to have something like, “Users who searched on X also commonly searched on Y”

Administrator Requirements

  • Indexing of web content – Most times, it’s important to be able to index web content – commonly through a crawler, especially if it’s dynamic content.
  • Indexing of repositories – You should understand your repository architecture and which repositories will need to be indexed and how the engines will do so. Some engines provide special hooks to index different major vendors (Open Text, SharePoint, Documentum, etc.) These types of tools are often not crawlable using a general web spider / crawling approach.
  • File System indexes – Many companies still have a significant content accessible on good old file servers – understand what types of file systems can be indexed and the protocol that the search engine supports (Samba, NFS, etc.)
  • Security of search results – Often, you might want to provide a single search experience that users can use to search any content to which they can navigate, even if that content is in its own repository which follows its own (proprietary) mechanism to secure documents.
    • This is something we have not tackled, but some engines do so. You typically have two approaches – “early binding”, when the security is basically rolled into the index and “late binding” which does security checking as users do searching.
    • Most vendors do the former because it can be very expensive to do a security check on every document that might show up in search results.
    • The primary advantage of late binding is that if you refresh your index weekly on, say, Saturday and there’s a document to which I did not have access, if someone provides me access on Monday, I still won’t see it in search (until after the next refresh); conversely, people can see items in search results that they no longer have access to as well.
  • Index scheduling / control – Regardless of the type of index, you should be able to control the schedule of indexing or how fast the indexer might hit your web sites / repositories / file systems. Also, it can be very useful to have different parts of the site refreshed at different rates. You might want press releases refreshed (or at least checked) hourly, while product documentation might only need to be refreshed weekly or monthly.
  • Relevancy control – It should be possible to administratively modify the relevancy for items – up or down. Ideally, this should be based on attributes of the content such as: the server it’s on, the path on the server, the date range of the content, presence of particular meta data, etc.
  • Synonyms – It should be possible to define business-specific synonyms. Some insight from Avi Rappoport (via the SearchCoP), is that you should be careful in the use of generic synonyms – they may cause more problems they fix (so if an engine provides synonym support, you might want to know if you get some default synonyms and how you might disable them).
  • Automation / integration – It is nice if the search engine can integrate or somehow provide support for automatic maintenance of some aspects of its configuration. For example, synonyms – you might already have a means to manage those (say, in your taxonomy tool!) and having to manually administer them as a separate work process would probably lead to long-term maintainability issues. In that case, some type of import mechanism. Or, another example, have your relevancy adjustments integrated with your web analytics (so that more popular content based on usage goes up in relevancy).
  • Performance (again) – How much content do you expect to index? How fast can that content be indexed by the engine? Does the engine do full re-indexing? Incremental? Real-time?
  • Reporting – You need to have good reporting.
    • Obvious stuff like most common searches (grouped by different spans like day, hour, week, month, etc., and also for time periods you can define – meaning, “Show me most common searches for the last six months grouped by week”), most common “no result” searches, common “error” searches, etc.
    • It would be especially useful to be able to do time analysis across these types of dimensions – Most engines don’t provide that from my experience; you can get a dump for a time period and a separate one for another period and you have to manually compare them. Being able to say, “How common has this search been for the last six months in each month?” helps you understand longer-term trends.
    • Also, it can be very useful to see reports where the search terms are somehow grouped. So a search for “email” and a search for “e-mail” (to use a very simple example) would show up together – basically some kind of categorization / standardization of the searches. Doing grouping based purely on the individual searches can make it very hard to “see the forest for the trees”.
    • Lastly – reports on what people do with search results can be very useful. OK – fine, “Product X” is a top ten search consistently, but what are people selecting when they search on that? Do they not click on anything? Do they click on the same item 90% of the time? Etc.
    • I’m also planning to post separately on more details around search metrics and analytics.  Keep watch!
  • Last but certainly not least – Architectural “fit” – Make sure you understand how well the engine will fit in your data center. OS(es) it runs on? Hardware compatibility, etc.  For some engines where you purchase a closed appliance, this may not be relevant but you should involve your data center people in understanding this area.

Evaluating and Selecting a Search Engine

Tuesday, September 30th, 2008

A few months back, I was asked to evaluate my company’s current solution solution against another search engine to try to determine if it would be worthwhile to implement a new solution. I’ve done package / tool evaluations in the past but I felt that there was something a bit different about this in that I needed to somehow integrate a fairly standard requirements-based evaluation with a measure of quality of the search results themselves, which are not easily expressed as concrete requirements.

So I set about the task and asked the SearchCop for suggestions about how to do an evaluation of the search results in a meaningful and supportable way. I received several useful results, including some suggestions from Avi Rappaport, about a methodology to go about identifying a good representation of search terms to use in an evaluation.

With my own experiences and those of the SearchCoP in hand, I came up with a process that I thought I would share here.

Two Components to the Evaluation

I split the assessment into two distinct parts. The first was a traditional “requirements-based” assessment which allowed me to reflect support for a number of functional or architectural needs I could identify. Some examples of such requirements were:

  • The ability to support multiple file systems;
  • The ability to control the web crawler (independent of robots.txt or robots tags embedded in pages)
  • The power and flexibility of the administration interface, etc.

The second part of the assessment was to measure the quality of the search results.

I’ll provide more details below for each part of the assessment, but the key thing for this assessment was the have a (somewhat) quantitative way to measure the overall picture of the effectiveness and power of the search engines. It might be possible to even quantitatively combine the measure of these two components, though I did not do so in this case.

Requirements Assessment

For the first part, I used a simplified quality functional deployment matrix – I identified the various requirements to consider and assigned them a weight (level of importance); based on some previous experiences, I forced the weights to be either 10 (very important -probably “mandatory” in a semantic sense), a 5 (desirable but not absolutely necessary) or a 1 (nice to have) – this provides a better spread in the final outcome, I believe.

Then I reviewed the search engines against those requirements and assigned each search engine a “score” which, again, was measured as a 10 (met out of the box), a 5 (met with some level of configuration), a 1 (met with some customization – i.e., probably some type of scripting or similar, but not configuration through an admin UI) and a 0 (does not meet and can not meet).

The overall “score” for an engine was then measured as the sum of the product of the score and weight for each requirement.

This simplistic approach can have the effect of giving too much weight to certain areas of requirements in total. Because each requirement is given a weight, if there are areas of requirements that have a lot of detail in your particular case, you can give that area too much overall weight simply because of the amount of detail. In other words, if you have a total of, say, 50 requirements and 30 of them are in one area (say you have specified 30 different file formats you need to support – each as a different requirement), then a significant percentage of your overall score will be contingent on that area. In some cases, that is OK but in many, it is not.

In order to work around this, I took the following approach:

  • Grouped requirements into a set of categories;
  • The categories should reflect natural cohesiveness of the requirements but should also be defined in a way that each category is roughly equal in importance to other categories;
  • Compute the total possible score for each category (which in my case was 10 * (total-weight-of-requirements-in-category)
  • Compute the relative score of that category for a search engine by summing the product of that engine’s score and the weight of the requirements for that category; the relative score is that engine’s score divided by the total possible score for that category.
  • Now sum all of the relative scores for each category and (to get a number between 0 and 100) multiply by 100

This approach gives you a score for each engine between 0 and 100 and also gives each category a roughly equal effect on the total score.

If you are looking for some insights on categories of requirements you might want to include in your evaluation, I provide some of my thoughts in a subsequent post.

Search Results Quality

To measure the quality of search results, I took Avi’s insights from the SearchCoP and identified a set of specific searches that I wanted to measure. I identified the candidate searches by looking at the log files for the existing search solution on the site and pulling out a few searches that fell into each category Avi identified. The categories included:

  • Simple queries
  • Complex queries
  • Common queries
  • Spelling, typing and vocabulary errors
  • Force matching edge-case issues, including:
    • Many matches
    • Few matches
    • No matches

Going into this, I assumed I did not necessarily know the “right” targets for these searches, so I enlisted some volunteers among a group of knowledgeable employees (content managers on the web site) who could complete a survey I put together. The survey included a section where the participant had to execute each search against each search engine (the survey provided a link to do the search – so the participants did not have to actually go to a search screen somewhere and enter the terms and search – this was important to keep it somewhat simpler). The participants were then asked to score the quality of the results for each search engine (on a scale of 1-5).

The survey also included some other questions about presentation of results, performance, etc. (even though we did not customize search result templates or tweak anything in the searches, we wanted to get a general sense of usability) and also included a section where users could define and rate their own searches.

The results from the survey were then analyzed to get an overall measure of quality of results across this candidate set of searches for each search engine – basically doing some aggregation of the different searches into average scores or similar.

Outcome of the Assessment

With the engines we were looking at, the results were that one was better on the administration / architectural requirements and the
other was better on the search results – which makes for an interesting decision, I think.

The key takeaway for me from this process is that it is at least quantitative – one can argue over the set of requirements to include, or the weight of any particular requirement or the score of an engine on a particular requirement. However, the discussion can be held at that level instead of a more qualitative level (AKA “gut feel”).

Additionally, for search engines, taking a two-part approach ensures that each of these very important factors are included and reflected in the final outcome.

Issues with this Approach

In the case of my own execution of this approach, I know there are some issues (the general methodology is sound, I believe). Including (in no particular order):

  • I defined the set of requirements (ideally, I would have liked to have input from others but I’ve basically been a one-man show and I don’t think others would have had a lot of input or time to provide that input).
  • I defined the weights for requirements (see above).
  • I assigned the score for the requirements (again, see above).
  • I did not have hands-on with each engine under consideration and had to lean a lot on documentation, demos and discussions with vendors.
  • All summed up – I think the exact scores could be in question but, given me as the only resource it worked reasonably well.

As for the survey / search results evaluation:

  • I would have liked a larger population of participants, including people who did not know the site
  • I would have liked a larger population of queries to be included, but I felt the number already was pretty large (about 40 pre-defined ones and ability for 10 more user-defined)
  • I did not mask which engine produced which results. As Walter Underwood mentions (he referenced this post from the SearchCoP thread), that can cause some significant issues with reliability of measures.