“the future is search enabled applications, not enterprise search”
I’m somewhat familiar with Stephen (I’ve seen him speak at a couple of conferences and also have followed his writing on his blog for some time), but I had actually not seen this declaration in the past (though Stephen says he’s accused of saying it too much).
In any event – I find this an interesting claim and I think I would agree with the sentiment but I also think that it depends on how you look at it. As I wrote previously in trying to lay out what I thought enterprise search is, I think that the key aspects of an enterprise search are that it’s available to all members of the enterprise and that it covers all relevant content.
Down in the details, if access to the enterprise search is through embedding that it in numerous locations or one location, I do not believe it matters. In fact, as I wrote previously, embedding access through multiple points is probably ideal – let workers access it within the environment in which they work, regardless of what tool(s) they normally use to do their job.
On the other hand, if the expectation is that you can embed search in single applications and expect that search only within that application is sufficient, I do not think that is now or will in the future be sufficient. The information needs for any organization are diverse enough that no one application can realistically handle all of them – email, document management, CRM, support knowledge bases, intranets, policies, etc.
In my last post, I described the goals I have tried to achieve with my proof of concept people search function. Here I will describe the design and implementation of this proof of concept.
Given the goals above, here’s the general outline of the design for this solution:
Initially the web application directly queried the various systems used as sources when generating a profile for a worker. That is not scalable and also limits the amount of processing you can do, so I designed a simple SQL database to contain the data for this (implemented in MySQL). This database is essentially a data mart of worker data. The primary tables are:
With the implementation of this database, I also implemented a synchronization tool that updates the data in the tables from the source systems for the various types of activities.
By automatically pulling data from these source systems (which workers use in their regular day-to-day work), you remove the need for the workers to maintain data.
Now, how should the profile page for a worker be presented?
Initially, I put together a design that did two things: 1) provided a typical employee directory style layout of my administrative details and 2) provided a list of all of the activities for a worker, grouped by activity source. In other words, you would see a list of all of the Wiki articles edited by the worker, a list of mailing list memberships, a list of community memberships, project team memberships, task assignments, etc. Each activity source’s list would be separately displayed (in a simple bulleted list). (Before this would go into production, I always have assumed I would ask for some design help from our electronic marketing group to give it a more professional look, but I thought the bulleted list worked perfectly well functionally.)
This proved simple and effective and also enabled the profile page to provide direct links to those activities that are addressable via a link (for example, the profile page could link directly to a Wiki article I’ve edited from my profile page, it could link to each discussion post, etc.)
However, this approach suffered from at least two problems: 1) it lacked an immediately obvious visual presentation of a worker’s attributes, and 2) it exposed every detailed activity of a worker to anyone who viewed the profile (I found when I demoed this to people, some had the immediate reaction of, “Wow – anyone can see all of these details? I’m not sure I like that!” – a reaction that surprised me given that any of the details are generally visible to anyone who wants to look, but go figure).
After looking for alternatives, I found that the keywords for a worker (when combined with their weights) provided good input for a tag cloud – which is what I ended up using as the default presentation of a worker’s keywords (visible to everyone). This helps to highlight what someone is “about”, presents a generally attractive visualization of the data, and, if the default view of a worker displays this tag cloud (and the worker’s administrative data) and does not show all of the details, it alleviates the concern mentioned above.
I have found the implementation of the tag cloud to be the trigger that pulls people into this tool – it helps satisfy my goal #5 because, for most people who have looked at this, it provides immediate validation when they see words they expect to see in their own tag cloud.
Here’s a shot of what part of my profile page looks like (partially obscured):
I wanted to keep the initial proof of concept simple in order to try to test different ways of using the data from the activity sources. With that in mind, here are some details on how I’ve done this so far:
Some additional functions I have layered on top of the basic profile / search mechanism that I believe will make this a valuable solution:
The proof of concept has been very interesting to work through and has presented me with some (subjective) proof of the value of this approach, as simple as it is. That being said, there are some issues and additional areas I hope are explored in the future:
I have previously described what I termed the various generations of solutions to the common challenge of workers finding connecting with or finding co-workers within an enterprise. My most recent post described the fourth generation solution – which enables users to search and connect using much more than simple administrative terms (name, email, address, etc.) for the search.
Over my next couple of posts, I will provide a write-up of a proof of concept implementation I’ve assembled that meets a lot of the need for this with what I believe to be relatively minimal investment.
The follow represent the goals I’ve set for myself in this proof of concept:
Also, I wanted to say that part of the inspiration for this proof of concept came from a session I attended at Enterprise Search Summit 2007 as presented by Trent Parkhill. In his session, he described a mechanism where submissions to a company’s repository would be tagged with the names of participants in the project that produced the document as a deliverable. Then, when users were searching for content, there was a secondary search that produced a list of people associated with the terms and / or documents found by the user’s search. I’ve kind of turned that around and treated the people as being tagged by the keywords of the items they produce.
In my next post, I will describe the overall design of my proof of concept.
However, I would go even further and suggest that the search industry (enterprise search as well as internet search engines) would also benefit if it were to define and adopt a standard response syntax for results (at least a response syntax that could be provided as an option). Obviously, for most users a straightforward HTML presentation is desirable as when they interact with an engine through their browser, they want to be able to view the results in their browser.
However, an ability to request results from an arbitrary engine in a standard format would be a great step forward – it would vastly simplify aggregation of results for federated search and more generally it could present the ability to programmatically interact with multiple engines for a variety of other purposes.
I know of one attempt that seems to drive to this – OpenSearch (which is associated with A9 – Amazon’s search engine) – a set of elements that can be used as extensions to an RSS format. Are there others? How widely known (and adopted?) is OpenSearch as a format?
Or, in other words, “How do you apply the application standards to improve findability to applications built by third-party providers who do not follow your standards?”
I’ve previously written about the standards I’ve put together for (web-based) applications that help ensure good findability for content / data within that application. These standards are generally relatively easy to apply to custom applications (though it can still be challenging to get involved with the design and development of those applications at the right time to keep the time investment minimal, as I’ve also previously written about).
However, it can be particularly challenging to apply these standards to third-party applications – For example, your CRM application, your learning management system, or your HR system, etc. Applying the existing standards could take a couple of different forms:
The rest of this post will discuss a solution for option #3 above – how you can implement a different solution. Note that some search engines will provide pre-built functionality to enable search within many of the more common third party solutions – those are great and useful, but what I will present here is a solution that can be implemented independent of the search engine (as long as the search engine has a crawler-based indexing function) and which is relatively minimal in investment.
So, you have a third party application and, for whatever reason, it does not adhere to your application standards for findability. Perhaps it fails the coverage principle and it’s not possible to adequate find the useful content without getting many, many useless items; or perhaps it’s the identity principle and, while you can find all of the desirable targets, they have redundant titles; or it might even be that the application fails the relevance principle and you can index the high value targets and they show up with good names in results but they do not show up as relevant for keywords which you would expect. Likely, it’s a combination of all three of these issues.
The core idea in this solution is that you will need a helper application that creates what I call “shadow pages” of the high value targets you want to include in your enterprise search.
Note: I adopted the use of the term “shadow page” based on some informal discussions with co-workers on this topic – I am aware that others use this term in similar ways (though I don’t think it means the exact same thing) and also am aware that some search engines address what they call shadow domains and discourage their inclusion in their search results. If there is a preferred term for the idea described here – please let me know!
What is a shadow page? For my purposes here, I define a shadow page as:
To make this solution work, there are a couple of minimal assumptions of the application. A caveat: I recognize that, while I consider these as relatively simple assumptions, it is very likely that some applications will still not be able to meet these and so not be able to be exposed via your enterprise search with this type of solution.
Given the description of a shadow page and the assumptions about what is necessary to support it, it is probably obvious how they are used and how they are constructed, but here’s a description:
First – you would use the query that gives you a list of targets (item #2 from the assumptions) from your source application to generate an index page which you can give your indexer as a starting point. This index page would have one link on it for each desirable target’s shadow page. This index page would also have “robots” <meta> tags of “noindex,follow” to ensure that the index page itself is not included as a potential target.
Second – The shadow page for each target (which the crawler reaches thanks to the index page) is dynamically built from the query of the application given the identity of the desirable search target (item #3 from the assumptions). The business rules defining how the desirable target should behave in search help define the necessary query, but the query would need to contain at minimum some of the following data: the name of the target, a description or summary of the target, some keywords that describe the target, a value which will help define the true URL of the actual target (per assumption #1, there must be a way to directly address each target).
The shadow page would be built something like the following:
The overall effect of this is that the search engine will index the shadow page, which has been constructed to ensure good adherence to the principles of enterprise search, and to a searcher, it will behave like a good search target but when the user clicks on it from a search result, the user ends up looking at the actual desired target. The only clue the user might have is that the URL of the target in the search results is not what they end up looking at in their browser’s address bar.
The following provides a simple example of the source (in HTML – sorry for those who might not be able to read it) for a shadow page (the parts that change from page to page are in bold):
<body> <div style="display:none;"> <h1>title of target</h1> description of target and keywords of target </div> </body> </html>
A few things that are immediately obvious advantages of this approach:
There are also a number of issues that I need to highlight with this approach – unfortunately, it’s not perfect!
There you have it – a solution to the exposure of your high value targets from your enterprise applications that is independent of your search engine and can provide you (the search administrator) with a good level of control over how content appears to your search engine, while ensuring that what is included highly adheres to my principles of enterprise search.
I’ve previously written about the three principles of enterprise search and also about the specific business process challenges I’ve run into again and again with web applications in terms of findability.
Here, I will provide some insights on the specific standards I’ve established to improve findability, primarily within web applications.
When an application is being specified, the application team must ensure that they discuss the following question with business users – What are the business objects within this application and which of those should be visible through enterprise search?
The first question is pretty standard and likely forms the basis for any kind of UML or entity relationship diagram that would be part of a design process for the application. The second part is often not asked but it forms the basis for what will eventually be the specific targets that will show in search results through the enterprise search.
Given the identification of which objects should be visible in search results, you can then easily start to plan out how they might show up, how the search engine will encounter them, whether the application might best provide a dynamic index page of links to the entities or support a standard crawl or perhaps even a direct index of the database(s) behind the application.
Basically, the standard here is that the application must provide a means to ensure that a search engine can find all of the objects that need to be visible and also to ensure that the search engine does not include things that it should not.
Some specific things that are included here:
With the standard for Coverage defined, we can be comfortable with knowing that the right things are going to show in search and the wrong things will not show up. How useful will they be as search results, though? If a searcher sees an item in a results list, will they be able to know that it’s what they’re looking for? So we need to ensure that the application addresses the identity principle.
The standard here is that the pages (ASP pages, JSP files, etc) that comprise the desirable targets for search must be designed to address the identity principle – specifically:
Now we know that the search includes what it should and we also know that when those items show in search, they will be identifiable for what they are. How do we ensure that the items show up in search for searches for which they are relevant, though?
The standards to address the relevance issue are:
For a good review of the <meta> tags in HTML pages, you can look at:
So we get to the exciting conclusion of my essays on the inclusion of employees in enterprise search. If you’ve read this far, you know how I have characters the first and second generation solutions and also provided a description of a third generation solution (which included some details on how we implemented it).
Here I will describe what I think of as a fourth generation solution to people finding within the enterprise. As I mentioned in the description of the third generation solution, one major omission still at this point is that the only types of searches with which you can find people is through administrative information – things like their name, address, phone number, user ID, email, etc.
This is useful when you have an idea of the person you’re looking for or at least the organization in which they might work. What do you do when you don’t know the person and may not even know the organization in which they work? You might know the particular skills or competencies they have but that may be it. This problem is particularly problematic in larger organizations or organizations that are physically very distributed.
The core idea with this type of solution is to provide the ability to find and work with people based on aspects beyond the administrative – the skills of the people, their interests, perhaps the network of people with which they interact, and more. While this might be a simplification, I think of this as expertise location, though that, perhaps, most cleanly fits into the first use case described below.
Some common use cases for this type of capability include:
This capability is something that has often been discussed and requested at my current employer, but which no one has really been willing to sponsor. That being said, I know there are several vendors with solutions in this space, including (at least – please share if you know of others):
A common aspect of these is that they attempt to (and perhaps succeed) in automating the process of expertise discovery. I’ve seen systems where an employee has to maintain their own skill set and the problem with these is that the business process to maintain the data does not seem to really embed itself into a company – inevitably, the data gets out of date and is ill-maintained and so the system does not work.
I can not vouch for the accuracy of these systems but I firmly believe that if people search in the enterprise is going to meet the promise of enabling people to find each other and connect based on of-the-moment needs (skills, interests, areas of work, etc), it will be based on this type of capability – automatically discovering those aspects of a worker based on their work products, their project teams, their work assignments, etc.
I imagine within the not too distant future, as we see more merger of the “web 2.0″ functionality into the enterprise this type of capability will become expected and welcome – it will be exciting to see how people will work together then.
This brings to a close my discussion of the various types of people search within the enterprise. I hope you’ve found this of interest. Please feel free to let me know if you think I have any omissions or misstatements in here – I’m happy to correct and/or fill in.
I plan another few posts that discuss a proof of concept I have put together based around the ideas of this fourth generation solution – look for those soon!
In my last post, I wrote about what I termed the first generation and second generation solution to people search in enterprise. This time, I will describe what I call a “third generation” solution to the problem that will integration people search with your enterprise search solution.
This is the stage of people search in use within my current employer’s enterprise.
What I refer to as a third generation solution for people search is one where an employee’s profile (their directory entry, i.e., the set of information about a particular employee) becomes a viable and useful target within your enterprise search solution. That is, when a user performs a search using the pervasive “search box” (you do have one, right?), they should be able to expect to find their fellow workers in the results (obviously, depending on the particular terms used to do the search) along with any content that matches that.
You remove the need for a searcher to know they need to look in another place (another application, i.e., the company’s yellow pages) and, instead, reinforce the primacy of that single search experience that brings everything together that a worker needs to do their job.
You also offer the full power of your enterprise search engine:
Below, you will find a discussion of the implementation process we used and the problems we encountered. It might be of use to you if you attempt this type of thing.
Before getting to that, though, I would like to discuss what I believe to be remaining issue with a third generation solution in order to set up my follow-up post on this topic, which will describe additional ideas to solving the “people finder” problem within an enterprise.
The primary issue with the current solution we have (or any similar solution based strictly on information from a corporate directory) is that the profile of a worker consists only of administrative information. That is, you can find someone based on their name, title, department, address, email, etc., etc., etc., but you can not do anything useful to find someone based on much more useful attributes – what they actually do, what their skills or competencies are or what their interests might be. More on this topic in my next post!
Read on from here for some insights on the challenges we faced in our implementation of this solution. It gets pretty detailed from here on out, so you’ve been warned!
This post is the first of a brief series of posts I plan to write about the integration of “people search” (employee directory) with your enterprise search solution. In a sense, this treats “people” as just another piece of content within your search, though they represent a very valuable type of content.
This post will be an introduction and describe both a first and second generation solution to this problem. In subsequent posts, I plan to describe a solution that takes this solution forward one step (simplifying things for your users among other things) and then into some research that I believe shows a lot of promise and which you might be able to take advantage of within your own enterprise search solution.
Finding contact information for your co-workers is such a common need that people have, forever, maintained phone lists – commonly just as word processing documents or spreadsheets – and also org charts, probably in a presentation file format of some type. I think of this approach as a first generation solution to the people search problem.
Its challenges are numerous, including:
As computer technology has evolved and companies implemented corporate directories for authentication purposes (Active Directory, LDAP, eDirectory, etc.), it has become common to maintain your phone book as a purely online system based on your corporate directory. What does such a solution look like and what are its challenges?
I think it’s quite common now that companies will have an online (available via their intranet) employee directory that you can search using some (local, specific to the directory) search tools. Obvious things like doing fielded searches on name, title, phone number, etc. My current employer has sold a product named eGuide for quite some time that provides exactly this type of capability.
eGuide is basically a web interface for exposing parts of your corporate Directory for search and also for viewing the org chart of a company (as reflected in the Directory).
We have had this implemented on our intranet for many years now. It has been (and continues to be) one of the more commonly used applications on our intranet.
The problems with this second generation solution, though, triggered me to try to provide a better solution a few years ago using our enterprise search. What are the problems with this approach? Here are the issues that triggered a different (better?) solution:
So there’s a brief description of what I would characterize as a first generation solution and a second generation solution along with highlights of some issues with each.
Up next, I’ll describe the next step forward in the solution to this issue – integrating people into your enterprise search solution.
The title of this post – “People know where to find that, though!” is a very common phrase I hear as the search analyst and the primary search advocate at my company. Another version would be, “Why would someone expect to find that in our enterprise search?”
Why do I hear this so often? I assume that many organizations, like my own, have many custom web applications available on their intranet and even their public site. It is because of that prevalence, combined with a lack of communication between the Business and the Application team, that I hear these phrases so often.
I have (unfortunately!) lost count of the number of times a new web-based application goes into production without anyone even considering the findability of the application and its content (data) within the context of our enterprise search.
Typically, the conversation seems to go something like this:
What did we completely miss in this discussion? Well, no one in the above process (unfortunately) has explicitly asked the question, “Does the content (data) in this site need to be exposed via our enterprise search?” Nor has anyone even asked the more basic question, “Should someone be able to find this application [the "home page" of the application in the context of a web application] via the enterprise search?”
I’ve seen this scenario play out many, many times in just the last few years here. What often happens next depends on the application but includes many of the following symptoms:
The overall effect is likely that the application does not work well with the enterprise search, or possibly that the application is that the application does not hold up to the pressure of the crawler hitting its pages much faster than anticipated (so I end up having to configure the crawler to avoid the application) and ending with yet another set of content that’s basically invisible in search.
Bringing this back around to the title – the response I often get when inquiring about a newly released application is something like, “People will know how to find that content – it’s in this application! Why would this need to be in the enterprise search?”
When I then ask, “Well, how do people know that they even need to navigate to or look in this application?” I’ll get a (virtual) shuffling of feet and shoulder shrugs.
All because of a perpetual lack of asking a few basic questions during a requirements gather stage of a project or (another way to look at it) lack of standards or policies which have “teeth” about the design and development of web application. The unfortunate thing is that, in my experience, if you ask the questions early, it’s typically on the scale of a few hours of a developer’s time to make the application work at least reasonably well with any crawler-based search engine. Unfortunately, because I often don’t find out about an application until after it’s in production, it then becomes a significant obstacle to get any changes made like this.
I’ll write more in a future post about the standards I have worked to establish (which are making some headway into adoption, finally!) to avoid this.
Edit: I’ve now posted the standards as mentioned above – you can find them in my post Standards to Improve Findability in Enterprise Applications.