<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lee Romero &#187; enterprise search</title>
	<atom:link href="http://blog.leeromero.org/tag/enterprise-search/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.leeromero.org</link>
	<description>On Content, Collaboration and Findability</description>
	<lastBuildDate>Tue, 23 Feb 2010 02:58:13 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Enterprise Search and Third-Party Applications</title>
		<link>http://blog.leeromero.org/2008/10/28/enterprise-search-and-third-party-applications/</link>
		<comments>http://blog.leeromero.org/2008/10/28/enterprise-search-and-third-party-applications/#comments</comments>
		<pubDate>Tue, 28 Oct 2008 01:07:32 +0000</pubDate>
		<dc:creator>Lee Romero</dc:creator>
				<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[application standards]]></category>
		<category><![CDATA[enterprise search]]></category>

		<guid isPermaLink="false">http://blog.leeromero.org/?p=82</guid>
		<description><![CDATA[Or, in other words, &#8220;How do you apply the application standards to improve findability to applications built by third-party providers who do not follow your standards?&#8221;
I&#8217;ve previously written about the standards I&#8217;ve put together for (web-based) applications that help ensure good findability for content / data within that application.  These standards are generally relatively [...]]]></description>
			<content:encoded><![CDATA[<p>Or, in other words, &#8220;How do you apply the <a title="Standards to Improve Findability in Enterprise Applications" href="http://blog.leeromero.org/2008/10/23/standards-to-improve-findability-in-enterprise-applications/">application standards to improve findability</a> to applications built by third-party providers who do not follow your standards?&#8221;</p>
<p>I&#8217;ve previously written about the standards I&#8217;ve put together for (web-based) applications that help ensure good findability for content / data within that application.  These standards are generally relatively easy to apply to custom applications (though it can still be challenging to get involved with the design and development of those applications at the right time to keep the time investment minimal, as I&#8217;ve also <a title="People know where to find that, though!" href="http://blog.leeromero.org/2008/10/13/people-know-where-to-find-that-though/">previously written</a> about).</p>
<p>However, <strong>it can be particularly challenging to apply these standards to third-party applications</strong> &#8211; For example, your CRM application, your learning management system, or your HR system, etc.  Applying the existing standards could take a couple of different forms:</p>
<ol>
<li>Ideally, when your organization goes through the selection process for such an application, <strong>your application standards are explicitly included in the selection criteria</strong> and used to ensure you select a solution that will conform to your standards</li>
<li>More commonly, you will identify compliance to the standards (perhaps during selection but perhaps later during implementation) and you might need to <strong>implement some type of customization within the application to provide compliance</strong>.</li>
<li>Hopefully, you identify compliance to the standards during selection or later, but <strong>you find you can not customize the application and you need a different solution</strong>.</li>
</ol>
<p>The rest of this post will discuss a solution for option #3 above &#8211; how you can implement a different solution.  Note that some search engines will provide pre-built functionality to enable search within many of the more common third party solutions &#8211; those are great and useful, but what I will present here is a solution that can be implemented independent of the search engine (as long as the search engine has a crawler-based indexing function) and which is relatively minimal in investment.</p>
<h4>Solving the third-party application conundrum for Enterprise Search</h4>
<p>So, <strong>you have a third party application</strong> and, for whatever reason, <strong>it does not adhere to your application standards for findability</strong>.  Perhaps it fails the <a title="The Coverage Search Principle" href="http://blog.leeromero.org/2008/01/08/the-3-principles-of-enterprise-search-part-1-coverage/">coverage principle</a> and it&#8217;s not possible to adequate find the useful content without getting many, many useless items; or perhaps it&#8217;s the <a title="Identity Search Principle" href="http://blog.leeromero.org/2008/01/08/the-3-principles-of-enterprise-search-identity/">identity principle</a> and, while you can find all of the desirable targets, they have redundant titles; or it might even be that the application fails the <a title="The Relevance Search Principle" href="http://blog.leeromero.org/2008/01/10/the-3-principles-of-enterprise-search-part-3-relevance/">relevance principle</a> and you can index the high value targets and they show up with good names in results but they do not show up as relevant for keywords which you would expect.  <strong>Likely, it&#8217;s a combination of all three of these issues</strong>.</p>
<p>The core idea in this solution is that you will need a helper application that creates what I call <strong>&#8220;shadow pages&#8221;</strong> of the high value targets you want to include in your enterprise search.</p>
<p style="padding-left: 30px;">Note: I adopted the use of the term &#8220;shadow page&#8221; based on some informal discussions with co-workers on this topic &#8211; I am aware that <a title="IBM Patent: URL mapping with shadow page support" href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&amp;Sect2=HITOFF&amp;d=PG01&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&amp;r=1&amp;f=G&amp;l=50&amp;s1=%2220060070022%22.PGNR.&amp;OS=DN/20060070022&amp;RS=DN/20060070022" target="_blank">others</a> use this term in similar ways (though I don&#8217;t think it means the exact same thing) and also am aware that some <a title="Shadow Domain commentary from Google" href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=35291" target="_blank">search engines</a> address what they call shadow domains and discourage their inclusion in their search results.  If there is a preferred term for the idea described here &#8211; please let me know!</p>
<p>What is a <strong>shadow page</strong>? For my purposes here, I define a shadow page as:</p>
<ul>
<li>A page which <strong>uniquely corresponds to a single desirable search target</strong>;</li>
<li>A page that has a <strong>distinct, unique URL</strong>;</li>
<li>A page that has <strong>a &lt;title&gt; and description</strong> that reflects the search target of which it is a shadow, and that title is distinct and provides a searcher who sees it in a search results page with insight about what the item is;</li>
<li>A page that has <strong>good metadata</strong> (keywords or other fields) that describe the target using terminology a searcher would use;</li>
<li>A page which <strong>contains text</strong> (likely hidden) <strong>that also reflects all of the above</strong> as well to enhance relevance for the words in the title, keywords, etc.;</li>
<li>A page which, when accessed, will <strong>automatically redirect a user to the page of which the page is a shadow</strong>.</li>
</ul>
<p>To make this solution work, there are a couple of minimal assumptions of the application.  A caveat: I recognize that, while I consider these as relatively simple assumptions, it is very likely that some applications will still not be able to meet these and so not be able to be exposed via your enterprise search with this type of solution.</p>
<ol>
<li>Each desirable <strong>search target must be addressable by a unique URL</strong>;</li>
<li>It should be possible to <strong>define a query which will give you a list of the desirable targets</strong> in the application; this query could be an SQL query run against a database or possible a web services method call that returns a result in XML (or probably other formats but these are the most common in my experience);</li>
<li>Given the identity (say, a primary key if you&#8217;re using a SQL database of some type) of a desirable search target, you <strong>must be able to also query the application for additional information about the search target</strong>.</li>
</ol>
<h4>Building a Shadow Page</h4>
<p>Given the description of a shadow page and the assumptions about what is necessary to support it, it is probably obvious how they are used and how they are constructed, but here&#8217;s a description:</p>
<p>First &#8211; you would use the query that gives you a list of targets (item #2 from the assumptions) from your source application to <strong>generate an index page</strong> which you can give your indexer as a starting point.  This index page would have one link on it for each desirable target&#8217;s shadow page.  This index page would also have &#8220;robots&#8221; &lt;meta&gt; tags of &#8220;noindex,follow&#8221; to ensure that the index page itself is not included as a potential target.</p>
<p>Second &#8211; The <strong>shadow page for each target</strong> (which the crawler reaches thanks to the index page) is dynamically built from the query of the application given the identity of the desirable search target (item #3 from the assumptions).  The business rules defining how the desirable target should behave in search help define the necessary query, but the query would need to contain at minimum some of the following data: the name of the target, a description or summary of the target, some keywords that describe the target, a value which will help define the true URL of the actual target (per assumption #1, there must be a way to directly address each target).</p>
<p>The shadow page would be built something like the following:</p>
<ul>
<li>The &lt;title&gt; tag would be the name of the target from the query (perhaps plus an application name to provide context)</li>
<li>The &#8220;description&#8221; &lt;meta&gt; tag would be the description or summary of the target from the query, perhaps plus a few static keywords that help ensure the presence of additional insight about the target.   For example, if the target represents a learning activity, the additional static text might indicate that.</li>
<li>The &#8220;keywords&#8221; &lt;meta&gt; tag would include the keywords from the query, plus some static keywords to ensure good coverage.  To follow the previous example, it might be appropriate to include words like &#8220;learning&#8221;, &#8220;training&#8221;, &#8220;class&#8221;, etc. in a target that is a learning activity to ensure that, if the keywords for the specific target do not include those words, searchers can still find the shadow page target in search.</li>
<li>The &lt;body&gt; of the page can be built to include all of the above text &#8211; from my experience, wrapping the body in a CSS style that visually hides the text keeps the text from actually appearing in a browser.</li>
<li>Lastly, the shadow page has a bit of JavaScript in it that redirects a browser to the actual target &#8211; this is why you need to have the target addressable via a URL and also that the query needs to provide the information necessary to create that URL.  Most engines (I know of none) will not be able to execute the JavaScript, so will not know that the page is really a redirect to the desired target.</li>
</ul>
<p>The overall effect of this is that the search engine will index the shadow page, which has been constructed to ensure good adherence to the principles of enterprise search, and to a searcher, it will behave like a good search target but when the user clicks on it from a search result, the user ends up looking at the actual desired target.  The only clue the user might have is that the URL of the target in the search results is not what they end up looking at in their browser&#8217;s address bar.</p>
<p>The following provides a simple example of the source (in HTML &#8211; sorry for those who might not be able to read it) for a shadow page (the parts that change from page to page are in bold):</p>
<pre style="padding-left: 30px;">&lt;html&gt;
&lt;head&gt;
&lt;TITLE&gt;<strong>title of target</strong>&lt;/TITLE&gt;
&lt;meta name="robots" content="index, nofollow"&gt;
&lt;meta name="keywords" content="<strong>keywords for target</strong>"&gt;
&lt;meta name="description" content="<strong>description of target</strong>"&gt;
&lt;script type="text/javascript"&gt;
document.location.href="<strong>URL of actual target</strong>";
&lt;/script&gt;
&lt;/head&gt;</pre>
<pre style="padding-left: 30px;">&lt;body&gt;
&lt;div style="display:none;"&gt;
&lt;h1&gt;<strong>title of target</strong>&lt;/h1&gt;
<strong>description of target</strong> and <strong>keywords of target</strong>
&lt;/div&gt;
&lt;/body&gt;
&lt;/html&gt;</pre>
<h4>Advantages of this Solution</h4>
<p>A few things that are immediately obvious advantages of this approach:</p>
<ol>
<li>First and foremost, with this approach, you can <strong>provide searchers with the ability to find content which otherwise would be locked away</strong> and not available via your enterprise search!</li>
<li>You can <strong>easily control the targets that are available via your enterprise search</strong> within the application (potentially much easier than trying to figure out the right combination of robots tags or inclusion / exclusion settings for your indexer).</li>
<li>You can <strong>very tightly control how a target looks to the search engine</strong> (including integration with your taxonomy to provide elaborated keywords, synonyms, etc)</li>
</ol>
<h4>Problems with this Solution</h4>
<p>There are also a number of issues that I need to highlight with this approach &#8211; unfortunately, it&#8217;s not perfect!</p>
<ol>
<li>The most obvious issue is that this <strong>depends on the ability to query</strong> for a set of targets against a database or web service of some sort.
<ol>
<li>Most applications will be technically able to support this, but in many organizations, this could present too great a risk from a <strong>data security</strong> perspective (the judicious use of database views and proper management of read rights on the database should solve this, however!)</li>
<li>This <strong>potentially creates too high a level of dependence</strong> between your search solution and the inner workings of the application &#8211; an upgrade of the application could change the data schema enough to break this approach.  Again, I think that the use of database views can solve this (by abstracting away the details of the implementation into a single view which can be changed as necessary through any upgrade).</li>
</ol>
</li>
<li><strong>Some applications may simply not offer a &#8220;deep linking&#8221; ability into high value content</strong> &#8211; there is no way to uniquely address a content item without the context of the application.  This solution can not be applied to such applications.  (Though my opinion is that such applications are poorly designed, but that&#8217;s another matter entirely!)</li>
<li>This solution <strong>depends on JavaScript</strong> to forward the user from the shadow page to the actual target.  If your user population has a large percentage of people who do not use JavaScript, this solution fails them utterly.</li>
<li>This solution depends on your search engine <strong>not following the JavaScript</strong> or somehow <strong>otherwise determining that the shadow page is a very low quality target</strong> (perhaps by examining the styles on the text and determining the text is not visible).  If you have a search engine that is this smart, hopefully you have a way to configure it to ignore this for at least some areas or page types.</li>
<li>Another major issue is that this solution largely <strong>circumvents a search engine&#8217;s built in ability to do item-by-item security</strong> as the target to the search engine is the shadow page.  I think the key here is to not use this solution for content that requires this level of security.</li>
</ol>
<h4>Conclusion</h4>
<p>There you have it &#8211; a solution to the exposure of your high value targets from your enterprise applications that is independent of your search engine and can provide you (the search administrator) with a good level of control over how content appears to your search engine, while ensuring that what is included highly adheres to my principles of enterprise search.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.leeromero.org/2008/10/28/enterprise-search-and-third-party-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>People Search and Enterprise Search, Part 3 &#8211; The Fourth Generation</title>
		<link>http://blog.leeromero.org/2008/10/20/people-search-and-enteprise-search-part-3/</link>
		<comments>http://blog.leeromero.org/2008/10/20/people-search-and-enteprise-search-part-3/#comments</comments>
		<pubDate>Mon, 20 Oct 2008 01:23:45 +0000</pubDate>
		<dc:creator>Lee Romero</dc:creator>
				<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[enterprise search]]></category>
		<category><![CDATA[people search]]></category>

		<guid isPermaLink="false">http://blog.leeromero.org/?p=62</guid>
		<description><![CDATA[So we get to the exciting conclusion of my essays on the inclusion of employees in enterprise search.  If you&#8217;ve read this far, you know how I have characters the first and second generation solutions and also provided a description of a third generation solution (which included some details on how we implemented it).
Here [...]]]></description>
			<content:encoded><![CDATA[<p>So we get to the exciting conclusion of my essays on the inclusion of employees in enterprise search.  If you&#8217;ve read this far, you know how I have characters the <a title="People Search and Enterprise Search" href="http://blog.leeromero.org/2008/10/14/people-search-and-enterprise-search/">first and second generation</a> solutions and also provided a description of a <a title="People Search and Enteprise Search, Part 2 - A third generation solution" href="http://blog.leeromero.org/2008/10/15/people-search-and-enteprise-search-part-2/">third generation solution</a> (which included some details on how we implemented it).</p>
<p>Here I will describe what I think of as a <strong>fourth generation solution</strong> to people finding within the enterprise.  As I mentioned in the description of the third generation solution, one major omission still at this point is that the only types of searches with which you can find people is through administrative information &#8211; things like their name, address, phone number, user ID, email, etc.</p>
<p>This is useful when you have an idea of the person you&#8217;re looking for or at least the organization in which they might work.  <strong>What do you do when you don&#8217;t know the person and may not even know the organization in which they work?</strong> You might know the particular skills or competencies they have but that may be it.  This problem is particularly problematic in larger organizations or organizations that are physically very distributed.</p>
<p>The core idea with this type of solution is to provide the ability to find and work with people based on aspects beyond the administrative &#8211; the skills of the people, their interests, perhaps the network of people with which they interact, and more.  While this might be a simplification, I think of this as <strong>expertise location</strong>, though that, perhaps, most cleanly fits into the first use case described below.</p>
<p>Some common use cases for this type of capability include:</p>
<ul>
<li><strong>Peer-to-peer connections</strong> &#8211; an employee is trying to solve a particular problem and they suspect someone in the company may have some skills that would enable them to solve the problem more quickly.  Searching using those skills as keywords would enable them to directly contact relevant employees.</li>
<li><strong>Resource planning</strong> &#8211; a consulting organization needs to staff a particular project and needs to find specific people with a particular skill set.</li>
<li><strong>Skill assessment</strong> &#8211; an organization needs to be able to ascertain the overall competency of their employees in particular skill sets to identify potential training programs to make available.</li>
</ul>
<p>This capability is something that has often been discussed and requested at my current <a title="Novell" href="http://www.novell.com" target="_blank">employer</a>, but which no one has really been willing to sponsor.  That being said, I know there are several <strong>vendors with solutions in this space</strong>, including (at least &#8211; please share if you know of others):</p>
<ul>
<li><a title="Connectbeam" href="http://www.connectbeam.com/" target="_blank">Connectbeam</a> &#8211; A company I first found out about at KM World 2007.  They had some interesting technology on display that combines expertise location with the ability to visualize and explore social networks based on that expertise.  Their product could digest content from a number of systems to automatically discern expertise.</li>
<li><a title="ActiveNet" href="http://www.tacit.com/products/activenet/technology.html" target="_blank">ActiveNet</a> &#8211; A product from <a title="Tacit Software" href="http://www.tacit.com/" target="_blank">Tacit Software</a>, which (at a high level) is similar to Connectbeam.  An interesting twist to this product is that it leaves the individuals whose expertise are managed in the system in control of how visible they are to others.  In the discussions I&#8217;ve had with this company about the product, I&#8217;ve always had the impression that, in part, this provides a kind of virtual mailing list functionality where you can contact others (those with the necessary expertise) by sending an email without knowing who it&#8217;s going to.  Those who receive it can either act on it or not and, as the sender, you only know who replies.</li>
<li>Another product about which I only know a bit is from a company named <a title="Trampoline Systems" href="http://www.trampolinesystems.com/" target="_blank">Trampoline Systems</a>.  I heard about them as I was doing some research on how to tune a prototype system of my own and understand that their <a title="Sonar Platform" href="http://www.trampolinesystems.com/products" target="_blank">Sonar platform </a>provides similar functionality.</li>
<li>[Edit: Added this on 03 November, 2008] I have also found that <a title="Recommind" href="http://www.recommind.com" target="_blank">Recommind</a> provides expertise location functionality &#8211; you can read more about it <a title="Recommind Expertise Location" href="http://www.recommind.com/expertise_location.html" target="_blank">here</a>.</li>
<li>[Edit: Added this on 03 November, 2008] I also understand that the <a title="Inquira" href="http://www.inquira.com" target="_blank">Inquira</a> search product provides expertise location, though it&#8217;s not entirely clear to me from what I can find about this tool how it does this.</li>
</ul>
<p>A common aspect of these is that they attempt to (and perhaps succeed) in <strong>automating the process of expertise discovery</strong>.  I&#8217;ve seen systems where an employee has to maintain their own skill set and the problem with these is that the business process to maintain the data does not seem to really embed itself into a company &#8211; inevitably, the data gets out of date and is ill-maintained and so the system does not work.</p>
<p>I can not vouch for the accuracy of these systems but I firmly believe that if people search in the enterprise is going to meet the promise of enabling people to find each other and connect based on of-the-moment needs (skills, interests, areas of work, etc), it will be based on this type of capability &#8211; automatically discovering those aspects of a worker based on their work products, their project teams, their work assignments, etc.</p>
<p>I imagine within the not too distant future, as we see more merger of the &#8220;web 2.0&#8243; functionality into the enterprise this type of capability will become expected and welcome &#8211; it will be exciting to see how people will work together then.</p>
<p>This brings to a close my discussion of the various types of people search within the enterprise. I hope you&#8217;ve found this of interest.  Please feel free to let me know if you think I have any omissions or misstatements in here &#8211; I&#8217;m happy to correct and/or fill in.</p>
<p>I plan another few posts that discuss a proof of concept I have put together based around the ideas of this fourth generation solution &#8211; look for those soon!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.leeromero.org/2008/10/20/people-search-and-enteprise-search-part-3/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>People Search and Enterprise Search, Part 2 &#8211; A third generation solution</title>
		<link>http://blog.leeromero.org/2008/10/15/people-search-and-enteprise-search-part-2/</link>
		<comments>http://blog.leeromero.org/2008/10/15/people-search-and-enteprise-search-part-2/#comments</comments>
		<pubDate>Wed, 15 Oct 2008 20:40:56 +0000</pubDate>
		<dc:creator>Lee Romero</dc:creator>
				<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[enterprise search]]></category>
		<category><![CDATA[people search]]></category>

		<guid isPermaLink="false">http://blog.leeromero.org/?p=43</guid>
		<description><![CDATA[In my last post, I wrote about what I termed the first generation and second generation solution to people search in enterprise.  This time, I will describe what I call a &#8220;third generation&#8221; solution to the problem that will integration people search with your enterprise search solution.
This is the stage of people search in [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://blog.leeromero.org/2008/10/14/people-search-and-enterprise-search/">last post</a>, I wrote about what I termed the first generation and second generation solution to people search in enterprise.  This time, I will describe what I call a &#8220;third generation&#8221; solution to the problem that will integration people search with your enterprise search solution.</p>
<p>This is the stage of people search in use within my current employer&#8217;s enterprise.</p>
<h4>What is the third generation?</h4>
<p>What I refer to as a <em>third generation</em> solution for people search is one where <strong>an employee&#8217;s profile </strong>(their directory entry, i.e., the set of information about a particular employee) <strong>becomes a viable and useful target within your enterprise search solution</strong>.  That is, when a user performs a search using the pervasive &#8220;search box&#8221; (you do have one, right?), they should be able to expect to find their fellow workers in the results (obviously, depending on the particular terms used to do the search) along with any content that matches that.</p>
<p>You remove the need for a searcher to know they need to look in another place (another application, i.e., the company&#8217;s yellow pages) and, instead, reinforce the primacy of that single search experience that brings everything together that a worker needs to do their job.</p>
<p>You also offer the full power of your enterprise search engine:</p>
<ul>
<li>Full text search &#8211; no need to specifically search within a field, though most engines will offer a way to support that as well if you want to ffer that as an option;</li>
<li>The power of the search engine to work on multi-word searches to boost relevancy &#8211; so a search on just a last name might include a worker&#8217;s profile in the search results but one that includes both a first and last name (or user ID or location or other keywords that might appear in the worker&#8217;s profile) likely ensures that the person shows in the first page of results amidst other content that match;</li>
<li>The power of synonyms &#8211; so you can define synonyms for names in your engine and get matches for &#8220;Rob Smith&#8221; when a user searches on &#8220;Robert Smith&#8221; or &#8220;Bob Smith&#8221;;</li>
<li>Spelling corrections &#8211; Your engine likely has this functionality, so it can automatically offer up corrections if someone misspells a name, even.</li>
</ul>
<p>Below, you will find a discussion of the implementation process we used and the problems we encountered.  It might be of use to you if you attempt this type of thing.</p>
<p>Before getting to that, though, I would like to discuss what I believe to be <strong>remaining issue with a third generation solution</strong> in order to set up my follow-up post on this topic, which will describe additional ideas to solving the &#8220;people finder&#8221; problem within an enterprise.</p>
<p>The primary issue with the current solution we have (or any similar solution based strictly on information from a corporate directory) is that <strong>the profile of a worker consists only of administrative information</strong>.  That is, you can find someone based on their name, title, department, address, email, etc., etc., etc., but you can not do anything useful to find someone based on much more useful attributes &#8211; what they actually do, what their skills or competencies are or what their interests might be.  More on this topic in my next post!</p>
<h4>The implementation of our third generation solution (read on for the gory details)</h4>
<p>Read on from here for some insights on the challenges we faced in our implementation of this solution.  It gets pretty detailed from here on out, so you&#8217;ve been warned!</p>
<p><span id="more-43"></span></p>
<p><strong>First up &#8211; How to ensure that we get correct <a title="The 3 Principles of Enterprise Search (part 1): Coverage" href="http://blog.leeromero.org/2008/01/08/the-3-principles-of-enterprise-search-part-1-coverage/">coverage</a> of the content set?</strong> As I&#8217;ve written about before, our search solution is based on <a title="QuickFinder" href="http://www.novell.com/products/openenterpriseserver/quickfinder.html" target="_blank">Novell&#8217;s QuickFinder</a> &#8211; which is a good, though not particularly sophisticated, search engine.  It offers two types of indexes &#8211; a crawled index and a file system index.  Given that our targets for this were web pages dynamically generated by <a title="eGuide" href="http://www.novell.com/products/eguide/" target="_blank">eGuide</a>, the only feasible option was a crawled index.  Simple enough, right?</p>
<p>My first attempt to solve this was to simply point the indexer at the eGuide application and let it go.  However, because eGuide has been built assuming your only (or at least primary) experience is going to be using it to do searches, there&#8217;s nothing of any real use that a crawler will find to index (it&#8217;s all hidden away behind HTML forms).  <strong>All a crawler will find is the &#8220;home page&#8221; of the application</strong> and possibly a few additional informational pages that are linked to directly from the home page.</p>
<p>Commonly, I have found that it&#8217;s possible to provide a URL based on the HTTP GET to a crawler that has the effect of doing a search using a form &#8211; many applications will treat a GET that specifies parameters in a query string the same as a POST that passes form-based input variables in the content of the HTTP post.  I tried this approach with eGuide but then ran into two additional issues:</p>
<ul>
<li>The results come in limited blocks of results (100 at most) and,</li>
<li>Within the results pages and profile pages, there are links to many, many, useless pages (sorting the results on different columns, &#8220;printable&#8221; versions of an employee&#8217;s profile, etc.)</li>
</ul>
<p>Both of these issues could be resolved with some changes in the eGuide application itself, but that was not feasible, due to resource constraints.</p>
<p>The<strong> solution we came up with</strong> was a simple web application which was directly integrated with the directory and which would do one thing:  generate a single HTML file that contained a link to each and every employee&#8217;s profile &#8211; effectively a simple index page with a bunch of links.  Easy enough.  Then we defined the index for this to start at that page and go one level deep from it and the problem is resolved &#8211; <strong>we get exactly the set of profiles and nothing more</strong>.</p>
<p><strong>Now onto the second principle of any search solution &#8211; <a title="The 3 Principles of Enterprise Search (part 2): Identity" href="http://blog.leeromero.org/2008/01/08/the-3-principles-of-enterprise-search-identity/">identity</a>.</strong> So we managed to get everything indexed that we wanted to and could try searches to find people (and they worked)!  However&#8230; We then found that eGuide (or, more accurately, the templates we were using for eGuide) suffered from one of the many <a title="People know where to find that, though!" href="http://blog.leeromero.org/2008/10/13/people-know-where-to-find-that-though/">problems</a> you encounter with web application: <strong>Every single page had the same text in the &lt;title&gt; tag &#8211; &#8220;Novell eGuide&#8221;</strong>.  So you can perform searches and get the correct items showing in the results page, but, because QuickFinder uses the &lt;title&gt; as a primary identifier in search results, you end up seeing 10 items all titled, &#8220;Novell eGuide&#8221;!  Not so very useful.</p>
<p><strong>Now onto the third principle of enterprise search &#8211; <a title="The 3 Principles of Enterprise Search (part 3): Relevance" href="http://blog.leeromero.org/2008/01/10/the-3-principles-of-enterprise-search-part-3-relevance/">relevance</a></strong> (you&#8217;ll see why I don&#8217;t jump right to the solution for the identity issue in a moment).  So we now have all employees as viable targets in the enterprise search and (assuming we fix the issue with identity mentioned above), we then run into the issue that, by itself, <strong>the profile of an employee may actually not be that relevant even when someone searches on that person&#8217;s name</strong>.  Why?  Because it is very possible (likely) that that person&#8217;s name is on a number of other pages or embedded in the metadata of documents that are also part of the enterprise search.  So their profile may show up as a result but is likely to not be high enough up the results to show up.</p>
<p>The <strong>solution to both of these issues</strong> turned out to be some very simple changes to the eGuide template.  First, stick the employee&#8217;s name into the &lt;title&gt; tag &#8211; now, when someone&#8217;s profile shows up in the enterprise search, it shows as &#8220;Novell eGuide: &lt;person&#8217;s name&gt;&#8221;. Very nice.  This has the additional benefit (with our search engine) of also boosting relevancy of profiles based on searches on employee names as words in titles significantly boost relevance of the content for searches on those words.</p>
<p>In addition, we made two additional enhancements &#8211; we included a &#8220;keywords&#8221; &lt;meta&gt; tag in the template that includes the values of an employee&#8217;s name, title, department, etc. as keywords.  Again, this boosts relevancy for searches on those keywords.  We also added a &#8220;description&#8221; &lt;meta&gt; tag in the template which is something general like &#8220;eGuide Profile for &lt;person&#8217;s name&gt;&#8221; &#8211; with our engine, the &#8220;description&#8221; &lt;meta&gt; tag is used as part of the &#8220;snippet&#8221; for a result in the results page.</p>
<p>With these changes, we finally had excellent coverage of employee profiles, excellent identity of the target items and good relevance of the items.  <strong>A success all around.</strong></p>
<p>The <strong>last issue</strong> we had to deal with revolves around our own infrastructure and corporate policies and security.  Novell has a fairly sophisticated authentication and provisioning infrastructure.  So it&#8217;s possible that a worker can have general access to our intranet and also to search but not have access to eGuide (a contractor or similar type of worker).  Control for this is provided by <a title="Access Manager" href="http://www.novell.com/products/accessmanager/">Access Manager</a> but that works based on ACL rules defined on paths on the intranet.  So if a worker does not have access to eGuide, they can not access the specific path on the intranet through which that application is available.</p>
<p>Search, on the other hand, cuts across all paths.  Also, our search engine does not integrate (at least we have not integrated it) with Access Manager to provide either early binding or late binding on security of search results.  So the question is, how do we make these valuable search results appear in our enterprise search without facing the possibility of allowing access to information that someone shouldn&#8217;t have?</p>
<p>The solution we came up with was a compromise:  From a policy perspective, the business was OK with people (well, their profiles) showing up as targets in search results.  But, we needed to ensure that only their name showed &#8211; no other details should be visible.  As mentioned above, QuickFinder uses the &#8220;description&#8221; &lt;meta&gt; tag as part of the snippet shown for results, but it will also pull text from the page to generate a snippet &#8211; possibly showing more directly in search results then we should.  We achieved the compromise by ensuring that people will only show in our &#8220;Best Bets&#8221; section of a results page &#8211; and items in &#8220;Best Bets&#8221; only show the title and an icon indicating the item type.  No more details are visible.</p>
<p>Below is a partial screen showing how a search on my own last name displays in the results (I&#8217;ve blurred out the names of other people that show up but left in a piece of content that is also a &#8220;best bet&#8221; on my name &#8211; in this case, the home page for a community of practice of which I am a community leader). You&#8217;ll note that it shows the name (and only the name) of the person and also displays a specific icon next to the results so that they &#8220;stand out&#8221; a bit more in the results list as being &#8220;people&#8221; (as opposed to a Word document or an HTML page, etc.)</p>
<div id="attachment_57" class="wp-caption aligncenter" style="width: 463px"><img class="size-full wp-image-57" title="romero-search1" src="http://blog.leeromero.org/wp-content/uploads/2008/10/romero-search1.png" alt="Partial Search Results Screen when searching on romero" width="453" height="205" /><p class="wp-caption-text">Partial Search Results Screen when searching on romero</p></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.leeromero.org/2008/10/15/people-search-and-enteprise-search-part-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What is Enterprise Search?</title>
		<link>http://blog.leeromero.org/2008/10/09/what-is-enterprise-search/</link>
		<comments>http://blog.leeromero.org/2008/10/09/what-is-enterprise-search/#comments</comments>
		<pubDate>Thu, 09 Oct 2008 02:36:38 +0000</pubDate>
		<dc:creator>Lee Romero</dc:creator>
				<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[enterprise search]]></category>

		<guid isPermaLink="false">http://blog.leeromero.org/?p=23</guid>
		<description><![CDATA[Having written previously about my own principles of enterprise search and then some ideas on how to select a search engine, I thought it might be time to back up a bit and write about what I think of as &#8220;enterprise search&#8221;.  Perhaps a bit basic or unnecessary but it gives some context to [...]]]></description>
			<content:encoded><![CDATA[<p>Having written previously about my own principles of enterprise search and then some ideas on how to select a search engine, I thought it might be time to back up a bit and write about what I think of as &#8220;enterprise search&#8221;.  Perhaps a bit basic or unnecessary but it gives some context to future posts.</p>
<h4>The <span style="color: #ff6600;">Enterprise</span> in Enterprise Search</h4>
<p>For me, the factors of a search solution that make it an enterprise solution include the following:</p>
<p><strong><em>The user interface to access the solution is available to all employees of the company.</em></strong></p>
<p>This has the following implications:</p>
<ul>
<li>Given today&#8217;s technologies, this probably means that it&#8217;s a <em>web-based interface</em> to access the search.
<ul>
<li>More generally, the interface needs to be <em>easily made available across the enterprise</em>.  In any somewhat-large organization, that means something either available online or easily installed or accessed from a user&#8217;s workspace.</li>
</ul>
</li>
<li>I would also suggest that the search interface should be <em>easily accessible</em> from an employee&#8217;s standard workspace or a common starting point for employees.
<ul>
<li>One easy way to achieve this is to make access to an enterprise search solution part of the general intranet experience &#8211; especially on an intranet that shares a standard look-and-feel (and so, hopefully, a standard template).  This is the ubiquitous &#8220;search box&#8221;.</li>
<li>Alternately, if users commonly use a specific application (say a CRM application or a collaboration tool), integrating the enterprise search into that is a better solution.</li>
<li>Lastly, it might be necessary to make access to the search solution &#8220;many-headed&#8221;.  Meaning, it might be best to make it available through a number of means, including through a standard intranet search, a specialized client-based application and embedded in other, user-specific tools.</li>
</ul>
</li>
<li>Given the likely broad range of users who will use it, the search interface should be <em>subject to very thorough usability design and testing</em>.</li>
<li>Adopting some of the <a title="Mental Models For Search Are Getting Firmer (Jakob Nielsen's Alertbox)" href="http://www.useit.com/alertbox/20050509.html">standard conventions of a search experience</a> are a good idea.</li>
</ul>
<p><em><strong>The content available through the solution covers all (relevant) content available to employees</strong></em></p>
<p>This has the following implications:</p>
<ul>
<li>If your enterprise has a significant volume of web content, your enterprise search <em>should index all of those web pages</em> &#8211; either via a web crawling approach or via indexing the file system containing the files (if it&#8217;s all static).</li>
<li>If your enterprise has a significant volume of content (data) in <em>enterprise applications</em> (CRM solution, HR system, etc.), you should have a strategy to determine which (if any) of the content from those systems would be included, how it will be included and how it will be presented in search results (potentially combined with content from many other systems in the same results page)</li>
<li>If your enterprise has <em>custom web applications</em> (and what organization does not), you should expect to provide a set of standards for design and development of web applications to ensure good findability from them and also expect to have to monitor compliance with those.</li>
<li>If your enterprise has significant content in <em>collaboration tools</em> (and who doesn&#8217;t &#8211; at least email!), you should have a strategy for including or not including that content.  This could be very broad-ranging &#8211; email, <a title="Microsoft Sharepoint" href="http://www.microsoft.com/Sharepoint/default.mspx" target="_blank">SharePoint</a> (and similar applications from companies like <a title="Interwoven WorkSite" href="http://www.interwoven.com/components/pagenext.jsp?topic=PRODUCT::WORKSITE" target="_blank">Interwoven</a>, <a title="OpenText LiveLink Communities of Practice" href="http://www.opentext.com/2/sol-products/sol-pro-docmgmt-collaboration/pro-ll-communities-practice.htm" target="_blank">Open Text</a>, <a title="Vignette Collaboration" href="http://www.vignette.com/us/Products/Collaboration" target="_blank">Vignette</a>, <a title="Novell Teaming" href="http://www.novell.com/products/teaming/" target="_blank">Novell</a>, etc.), shared file systems, IM logs, and so on.  At the very least, you need to consider the cost and value of including these types of content.</li>
<li>If you have <em>content repositories </em>available to employees (a document management system (or systems!) or a records management system), again, you should consider the cost and value of including content from these in your enterprise search.</li>
<li>While it is very useful to have a separate search for <a title="Employee Directory Search: Resolving Conflicting Usability Guidelines (Alertbox)" href="http://www.useit.com/alertbox/20030224.html">finding employees in a corporate directory</a>, I believe that an enterprise search solution should <em>include employees as a distinct &#8220;content type&#8221;</em> and include them in standard search results page as well when relevant (e.g., searching on employee names, etc)</li>
<li>Another major question regarding the content of your enterprise search is <em>security</em>.  If you include all of that content in your search, how will you manage the security of the items?  The two major options are early binding (building ACLs into the search) or late binding (checking security at search time).  If you are not familiar with these, I would recommend you do a bit of internet searching on the topics as it&#8217;s very important to your solution.  I&#8217;ve found some <a title="Search Done Right » Blog Archive » What’s Wrong with Google’s Enterprise Search Security? (Part 1)" href="http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/" target="_blank">interesting</a> <a title="KM Space: Best Practices for Securing Enterprise Search" href="http://kmspace.blogspot.com/2008/06/best-practices-for-securing-enterprise.html" target="_blank">articles</a> <a title="Part 2: Mapping Security Requirements to Enterprise Search" href="http://www.ideaeng.com/pub/entsrch/v3n5/article01.html" target="_blank">on</a> this <a title="Official Google Enterprise Blog: Document-level security with Enterprise Search" href="http://googleenterprise.blogspot.com/2006/12/document-level-security-with.html" target="_blank">topic</a>.
<ul>
<li>In my mind, it&#8217;s also feasible to &#8220;punt&#8221; on security in a sense and work to ensure that your enterprise search solution includes everything that is generally accessible to your employee population but does not include anything with specific access control on it.</li>
<li>If you can achieve the effect of getting a user &#8220;close to&#8221; the content (ensuring some level of &#8220;<a href="http://www.useit.com/alertbox/20040802.html" target="_self">information scent</a>&#8221; shows up) but leaving it to the user to make the final step (through any application-specific access control) seems to work well.</li>
</ul>
</li>
</ul>
<h4>The <span style="color: #ff6600;">Search</span> in Enterprise Search</h4>
<p>The other half of your enterprise search solution will be the search engine itself.  There are plenty (many!) options available with a variety of strengths and weaknesses.  I think if you plan to implement a truly enterprise search, the above list of content-based considerations should get you thinking of all of the places where you may have content &#8220;hiding&#8221; in your organization.</p>
<p>From that list, you should have a good sense of the volume of content and the complexity of sources your search will need to deal with.</p>
<p>Combining that with a careful <a href="http://blog.leeromero.org/2008/10/01/categories-of-search-requirements/">requirements definition</a> process and <a href="http://blog.leeromero.org/2008/09/30/evaluating-and-selecting-a-search-engine/">evaluation of alternatives</a> should lead to a successful selection of a tool.</p>
<p>Once you have a tool, you &#8220;just&#8221; need to apply the proper amount of elbow grease to get it to index all of the content you wish and present it in a sensible way to your users!  No big deal, right?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.leeromero.org/2008/10/09/what-is-enterprise-search/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
