Lee Romero

On Content, Collaboration and Findability
October 13th, 2008

People know where to find that, though!

The title of this post – “People know where to find that, though!” is a very common phrase I hear as the search analyst and the primary search advocate at my company. Another version would be, “Why would someone expect to find that in our enterprise search?”

Why do I hear this so often? I assume that many organizations, like my own, have many custom web applications available on their intranet and even their public site. It is because of that prevalence, combined with a lack of communication between the Business and the Application team, that I hear these phrases so often.

I have (unfortunately!) lost count of the number of times a new web-based application goes into production without anyone even considering the findability of the application and its content (data) within the context of our enterprise search.

Typically, the conversation seems to go something like this:

  • Business: “We need an application that does X, Y and Z and is available on our web site.”
  • Application team: “OK – let’s get the requirements laid out and build the site. You need it to do X, Y and Z. So we will build a web application that has page archetypes A, B and C.”
  • Application team then builds the application, probably building in some kind of local search function – so that someone can find data once they are within the application.
  • The Business accepts the usability of the application and it goes into production

What did we completely miss in this discussion? Well, no one in the above process (unfortunately) has explicitly asked the question, “Does the content (data) in this site need to be exposed via our enterprise search?” Nor has anyone even asked the more basic question, “Should someone be able to find this application [the "home page" of the application in the context of a web application] via the enterprise search?”

  • Typically, the Business makes the (reasonable) assumption that goes something like, “Hey – I can find this application and navigate through its content via a web browser, so it will naturally work well with our enterprise search and I will easily be able to find it, right?!”
  • On the other hand, the Application Team has likely made 2 assumptions: 1) the Business did not explicitly ask for any kind of visibility in the enterprise search solution, so they don’t expect that, and 2) they’ve (likely) provided a local search function, so that would be completely sufficient as a search.

I’ve seen this scenario play out many, many times in just the last few years here. What often happens next depends on the application but includes many of the following symptoms:

  • The page archetypes designed by the Application Team will have the same (static) <title> tag in every instance of the page, regardless of the data displayed (generally, the data would be different based on query string parameters).
    • The effect? A web-crawler-based search engine (which we use) likely uses the <title> tag as an identifier for content and every instance of each page type has the same title, resulting in a whole lot of pretty useless (undifferentiated) search results. Yuck.
  • The page archetypes have either no or maybe redundant other metadata – keywords, description, content-date, author, etc.
    • The effect? The crawler has no differentiation based on <titles> and no additional hints from metadata. That is, lousy relevance.
  • The application has a variety of navigation or data manipulation capabilities (say, sorting data) based on standard HTML links.
    • The effect? The crawler happily follows all of the links – possibly (redundant) indexing the same data many, many times simply sorted on different columns.
    • Another effect? The dreaded calendar affect – the crawler will basically never stop finding new links because there’s always another page.
    • In either case, we see poor coverage of the content.

The overall effect is likely that the application does not work well with the enterprise search, or possibly that the application is that the application does not hold up to the pressure of the crawler hitting its pages much faster than anticipated (so I end up having to configure the crawler to avoid the application) and ending with yet another set of content that’s basically invisible in search.

Bringing this back around to the title – the response I often get when inquiring about a newly released application is something like, “People will know how to find that content – it’s in this application! Why would this need to be in the enterprise search?”

When I then ask, “Well, how do people know that they even need to navigate to or look in this application?” I’ll get a (virtual) shuffling of feet and shoulder shrugs.

All because of a perpetual lack of asking a few basic questions during a requirements gather stage of a project or (another way to look at it) lack of standards or policies which have “teeth” about the design and development of web application. The unfortunate thing is that, in my experience, if you ask the questions early, it’s typically on the scale of a few hours of a developer’s time to make the application work at least reasonably well with any crawler-based search engine. Unfortunately, because I often don’t find out about an application until after it’s in production, it then becomes a significant obstacle to get any changes made like this.

I’ll write more in a future post about the standards I have worked to establish (which are making some headway into adoption, finally!) to avoid this.

Edit: I’ve now posted the standards as mentioned above – you can find them in my post Standards to Improve Findability in Enterprise Applications.

Leave a Reply