Lee Romero

On Content, Collaboration and Findability
October 15th, 2008

People Search and Enterprise Search, Part 2 – A third generation solution

In my last post, I wrote about what I termed the first generation and second generation solution to people search in enterprise. This time, I will describe what I call a “third generation” solution to the problem that will integration people search with your enterprise search solution.

This is the stage of people search in use within my current employer’s enterprise.

What is the third generation?

What I refer to as a third generation solution for people search is one where an employee’s profile (their directory entry, i.e., the set of information about a particular employee) becomes a viable and useful target within your enterprise search solution. That is, when a user performs a search using the pervasive “search box” (you do have one, right?), they should be able to expect to find their fellow workers in the results (obviously, depending on the particular terms used to do the search) along with any content that matches that.

You remove the need for a searcher to know they need to look in another place (another application, i.e., the company’s yellow pages) and, instead, reinforce the primacy of that single search experience that brings everything together that a worker needs to do their job.

You also offer the full power of your enterprise search engine:

  • Full text search – no need to specifically search within a field, though most engines will offer a way to support that as well if you want to ffer that as an option;
  • The power of the search engine to work on multi-word searches to boost relevancy – so a search on just a last name might include a worker’s profile in the search results but one that includes both a first and last name (or user ID or location or other keywords that might appear in the worker’s profile) likely ensures that the person shows in the first page of results amidst other content that match;
  • The power of synonyms – so you can define synonyms for names in your engine and get matches for “Rob Smith” when a user searches on “Robert Smith” or “Bob Smith”;
  • Spelling corrections – Your engine likely has this functionality, so it can automatically offer up corrections if someone misspells a name, even.

Below, you will find a discussion of the implementation process we used and the problems we encountered. It might be of use to you if you attempt this type of thing.

Before getting to that, though, I would like to discuss what I believe to be remaining issue with a third generation solution in order to set up my follow-up post on this topic, which will describe additional ideas to solving the “people finder” problem within an enterprise.

The primary issue with the current solution we have (or any similar solution based strictly on information from a corporate directory) is that the profile of a worker consists only of administrative information. That is, you can find someone based on their name, title, department, address, email, etc., etc., etc., but you can not do anything useful to find someone based on much more useful attributes – what they actually do, what their skills or competencies are or what their interests might be. More on this topic in my next post!

The implementation of our third generation solution (read on for the gory details)

Read on from here for some insights on the challenges we faced in our implementation of this solution. It gets pretty detailed from here on out, so you’ve been warned!

First up – How to ensure that we get correct coverage of the content set? As I’ve written about before, our search solution is based on Novell’s QuickFinder – which is a good, though not particularly sophisticated, search engine. It offers two types of indexes – a crawled index and a file system index. Given that our targets for this were web pages dynamically generated by eGuide, the only feasible option was a crawled index. Simple enough, right?

My first attempt to solve this was to simply point the indexer at the eGuide application and let it go. However, because eGuide has been built assuming your only (or at least primary) experience is going to be using it to do searches, there’s nothing of any real use that a crawler will find to index (it’s all hidden away behind HTML forms). All a crawler will find is the “home page” of the application and possibly a few additional informational pages that are linked to directly from the home page.

Commonly, I have found that it’s possible to provide a URL based on the HTTP GET to a crawler that has the effect of doing a search using a form – many applications will treat a GET that specifies parameters in a query string the same as a POST that passes form-based input variables in the content of the HTTP post. I tried this approach with eGuide but then ran into two additional issues:

  • The results come in limited blocks of results (100 at most) and,
  • Within the results pages and profile pages, there are links to many, many, useless pages (sorting the results on different columns, “printable” versions of an employee’s profile, etc.)

Both of these issues could be resolved with some changes in the eGuide application itself, but that was not feasible, due to resource constraints.

The solution we came up with was a simple web application which was directly integrated with the directory and which would do one thing: generate a single HTML file that contained a link to each and every employee’s profile – effectively a simple index page with a bunch of links. Easy enough. Then we defined the index for this to start at that page and go one level deep from it and the problem is resolved – we get exactly the set of profiles and nothing more.

Now onto the second principle of any search solution – identity. So we managed to get everything indexed that we wanted to and could try searches to find people (and they worked)! However… We then found that eGuide (or, more accurately, the templates we were using for eGuide) suffered from one of the many problems you encounter with web application: Every single page had the same text in the <title> tag – “Novell eGuide”. So you can perform searches and get the correct items showing in the results page, but, because QuickFinder uses the <title> as a primary identifier in search results, you end up seeing 10 items all titled, “Novell eGuide”! Not so very useful.

Now onto the third principle of enterprise search – relevance (you’ll see why I don’t jump right to the solution for the identity issue in a moment). So we now have all employees as viable targets in the enterprise search and (assuming we fix the issue with identity mentioned above), we then run into the issue that, by itself, the profile of an employee may actually not be that relevant even when someone searches on that person’s name. Why? Because it is very possible (likely) that that person’s name is on a number of other pages or embedded in the metadata of documents that are also part of the enterprise search. So their profile may show up as a result but is likely to not be high enough up the results to show up.

The solution to both of these issues turned out to be some very simple changes to the eGuide template. First, stick the employee’s name into the <title> tag – now, when someone’s profile shows up in the enterprise search, it shows as “Novell eGuide: <person’s name>”. Very nice. This has the additional benefit (with our search engine) of also boosting relevancy of profiles based on searches on employee names as words in titles significantly boost relevance of the content for searches on those words.

In addition, we made two additional enhancements – we included a “keywords” <meta> tag in the template that includes the values of an employee’s name, title, department, etc. as keywords. Again, this boosts relevancy for searches on those keywords. We also added a “description” <meta> tag in the template which is something general like “eGuide Profile for <person’s name>” – with our engine, the “description” <meta> tag is used as part of the “snippet” for a result in the results page.

With these changes, we finally had excellent coverage of employee profiles, excellent identity of the target items and good relevance of the items. A success all around.

The last issue we had to deal with revolves around our own infrastructure and corporate policies and security. Novell has a fairly sophisticated authentication and provisioning infrastructure. So it’s possible that a worker can have general access to our intranet and also to search but not have access to eGuide (a contractor or similar type of worker). Control for this is provided by Access Manager but that works based on ACL rules defined on paths on the intranet. So if a worker does not have access to eGuide, they can not access the specific path on the intranet through which that application is available.

Search, on the other hand, cuts across all paths. Also, our search engine does not integrate (at least we have not integrated it) with Access Manager to provide either early binding or late binding on security of search results. So the question is, how do we make these valuable search results appear in our enterprise search without facing the possibility of allowing access to information that someone shouldn’t have?

The solution we came up with was a compromise: From a policy perspective, the business was OK with people (well, their profiles) showing up as targets in search results. But, we needed to ensure that only their name showed – no other details should be visible. As mentioned above, QuickFinder uses the “description” <meta> tag as part of the snippet shown for results, but it will also pull text from the page to generate a snippet – possibly showing more directly in search results then we should. We achieved the compromise by ensuring that people will only show in our “Best Bets” section of a results page – and items in “Best Bets” only show the title and an icon indicating the item type. No more details are visible.

Below is a partial screen showing how a search on my own last name displays in the results (I’ve blurred out the names of other people that show up but left in a piece of content that is also a “best bet” on my name – in this case, the home page for a community of practice of which I am a community leader). You’ll note that it shows the name (and only the name) of the person and also displays a specific icon next to the results so that they “stand out” a bit more in the results list as being “people” (as opposed to a Word document or an HTML page, etc.)

Partial Search Results Screen when searching on romero

Partial Search Results Screen when searching on romero

One Response to “People Search and Enterprise Search, Part 2 – A third generation solution”

  1. [...] I have characters the first and second generation solutions and also provided a description of a third generation solution (which included some details on how we implemented [...]

Leave a Reply