Lee Romero

On Content, Collaboration and Findability
October 10th, 2011

Language change over time in your search log

This is a second post in a series I have planned about the language found throughout your search log – all the way into the “long tail” and how it might or might not be feasible to understand it all.

My previous post, “80-20: The lie in your search log?“, highlighted how the slope of “short head” of your search terms may not be as steep as anecdotes would say.  That is, there can be a lot less commonality within a particular time range among even the most common terms in your search log than you might expect.

After writing that post, I began to wonder about the overall re-use of terms over periods of time.

In other words:

Even while commonality of re-using terms within a month is relatively low, how much commonality do we see in our users’ language (i.e., search terms) from month to month?

To answer this, I needed to take the entire set of terms for a month and compare them with the entire set from the next month and determine the overlap and then compare the second month’s set of terms to a third month’s, and so on.  Logically not a hard problem but quite a challenge in practice due to the volume of data I was manipulating (large only in the face of the tools I have to manipulate it).

So I pulled together every single term used over a period of about 18 months and broke them into the set used for each of those months and performed the comparison.

Before getting into the details, a few details to share for context about the search solution I’m writing about here:

  • The average number of searches performed each month was almost 123,000.
  • The average number of distinct terms during this period was just under 53,000.
  • This results in an average of about 2.3 search for each distinct term

My expectation was that comparing the entire set of terms from one month to the next would show a relatively high percentage of overlap.  What I found was not what I expected.

If you look at the unique terms and their overlap, surprisingly, the average overlap between months was a shockingly low 13.2%.  In other words, over 86% of the terms in any given month were not used at all in the

Month to Month Re-Use of Search Terms

previous month.

If you look at the total searches performed and the percent of searches performed with terms from the prior month, this goes up to an average of 36.2% – reflecting that the terms that are re-used in a subsequent month among the most common terms overall.

Month to Month Re-Use of Search Terms

As you can see, the amount of commonality from month-to-month among the terms used is very low.

What can you draw from this observation?

In a brief discussion about this with noted search analytics expert Lou Rosenfeld, his reaction was that this represented a significant amount of change in the information needs of the users of the system – significant enough to be surprising.

Another conclusion I draw from this is that it provides another reason why it is very hard to meaningfully improve search across the language of your users.  Based on my previous post on the flatness of the curve of term use within a month, we know that it we need to look at a pretty significant percentage of distinct terms each month to account for a decent percentage of all searches – 12% of distinct terms to account for only 50% of searches.  In our search solution, that 12% doesn’t seem that large until you realize it is still represents about 6,000 distinct terms.

Coupling that with the observation from the analysis here means that even if you review those terms for a given month, you will likely need to review a significant percentage of brand new terms the next month, and so on.  Not an easy task.

Having established just how challenging this can be, my next few posts will provide some ideas for grappling with the challenges.

In the meantime, if you have any insight on similar statistics from your solution (or statistics about the shape of the search log curve I previously wrote above), please feel free to share here, on the SearchCoP on Yahoo! groups or on the Enterprise Search Engine Professionals group on LinkedIn – I would very much like to compare numbers to see if we can identify meaningful generalizations from different solution.

September 23rd, 2011

The Findability Gap by Lou Rosenfeld

Lou Rosenfeld has just published a great presentation I would highly recommend for anything working in the search space:  The Findability Gap.

It provides a great picture of the overall landscape of the problem (it’s not just search, after all!).

I especially liked slide 4 – a very telling illustration of the challenge we face in intelligently making information available to our users.

Re: Slide 24 – As I’ve written about before, I would say that the 80/20 rule is more than just “not quite accurate”.  But that’s mincing words.

Overall, a highly recommended read.

June 14th, 2011

KMers.org Chat on the Importance of Search in your KM Solution

Last week, I moderated a discussion for the weekly KMers.org Twitter chat about “The Importance of Search in your KM Solution”.

My intent was to try to get an understanding about how important search is relative to other components of a KM search (connecting people, collecting and managing content, etc.).

It was a good discussion with about a dozen or so people taking part (that I could tell).

You can read through the transcript of the session here.   Let me know what you think on the topic!

During the discussion, a great question came up about measuring the success of your search solution (thanks to Ed Dale) which I thought deserved its own discussion, so I have submitted a suggestion for a new topic for an upcoming KMers.org chat.

Please visit the suggestion here and vote for it!

November 16th, 2010

Taxonomy Boot Camp 2010

Yesterday, I delivered my presentation at Taxonomy Boot Camp 2010 on “Enterprise Taxonomy: Six Components of a vision”.  You can find the presentation on my site here and also on the Taxonomy Boot Camp site here (the latter requires a login you will need to get from the conference).

Some of the most interesting topics for me this week have been about semantic (web) technologies and also some details on the implementation of taxonomy in SharePoint 2010.  Good stuff.

In addition, I’ve had the opportunity to meet and re-meet many people who work in the taxonomy space and also in search, so it’s been a very revitalizing experience.

I also (finally) picked up a copy of the Accidental Taxonomist by Heather Hedden.  I am really looking forward to reading it.

November 13th, 2010

80-20: The lie in your search log?

Recently, I have been trying to better understand the language in use by our users in the search solution we use, and in order to do that, I have been trying to determine what tools and techniques one might use to do that. This is the first post in a planned series about this effort.

I have many goals in pursuing this.  The primary goal has been to be able to identify trends from the whole set of language in use by users (and not just the short head).  This goals supports the underlying business desire of identifying content gaps or (more generally) where the variety of content available in certain categories does not match with the variety expected by users (i.e., how do we know when we need to target the creation and publication of specific content?)

Many approaches to this do focus on the short head – typically the top N terms, where N might be 50 or 100 or even 500 (some number that’s manageable).  I am interested in identifying ways to understand the language through the whole long tail as well.

As I have dug into this, I realized an important aspect of this problem is to understand how much commonality there is to the language in use by users and also how much the language in use by users changes over time – and this question leads directly to the topic at hand here.

Search Term Usage

Chart 1

There is an anecdote I have heard many times about the short head of your search log that “80 percent of your searches are accounted for by the top 20% most commonly-used terms“.  I now question this and wonder what others have seen.

I have worked closely with several different search solutions in my career and the three I have worked most closely with (and have most detailed insight on) do not come even close to the above assertion.  Chart 1 shows the usage curve for one of these.  The X axis is the percent of distinct terms (ordered by use) and the Y axis shows the percent of all searches accounted for by all terms up to X.

From this chart, you can see that it takes approximately 55% of distinct terms to account for 80% of all searches – that is a lot of terms!

This curve shows the usage for one month – I wondered about how similar this would be for other months and found (for this particular search solution) that the curves for every month were basically the exact same!

Wondering if this was an anomaly, I looked at a second search solution I have close access to to wonder if it might show signs of the “80/20″ rule.  Chart 2 adds the curve for this second solution (it’s the blue curve – the higher of the two).

Chart 2

Chart 2

In this case, you will find that the curve is “higher” – it reaches 80% of searches at about 37% of distinct terms.  However, it is still pretty far from the “80/20″ rule!

After looking at this data in more detail, I have realized why I have always been troubled at the idea of paying close attention to only the so-called “short head” – doing so leaves out an incredible amount of data!

In trying to understand the details of why, even though neither is close to adhering to the “80/20″ rule, the usage curves are so different, I realize that there are some important distinctions between the two search solutions:

  1. The first solution is from a knowledge repository – a place where users primarily go in order to do research; the second is for a firm intranet – much more focused on news and HR type of information.
  2. The first solution provides “search as you type” functionality (showing a drop-down of actual search results as the user types), while the second provides auto-complete (showing a drop-down of possible terms to use).  The auto-complete may be encouraging users to adopt more commonality.

I’m not sure how (or really if) these factor into the shape of these curves.

In understanding this a bit better, I hypothesize two things:  1) the shape of this curve is stable over time for any given search solution, and 2) the shape of this curve tells you something important about how you can manage your search solution.  I am planning to dig more to answer hypothesis #1.

Questions for you:

  • Have you looked at term usage in your search solution?
  • Can you share your own usage charts like the above for your search solution and describe some important aspects of your solution?  Insight on more solutions might help answer my hypothesis #2.
  • Any ideas on what the shape of the curve might tell you?

I will be writing more on these search term usage curves in my next post as I dig more into the time-stability of these curves.

November 12th, 2010

Speaking at Taxonomy Boot Camp 2010

Next week, I will be speaking at Taxonomy Boot Camp 2010 – on the topic of Enterprise Taxonomy: A Vision.  Much of what I will have to share is from a post I wrote here some time ago.

Hopefully I have enough new issue to make it a good session for everyone!

February 22nd, 2010

Best Bet Governance

My first post back after too-long a period of time off.  I wanted to jump back in and share some concrete thoughts on best bet governance.

I’ve previously written about best bets and how I thought, while not perfect, they were an important part of a search solution.  In that post, I also described the process we had adopted for managing best bets, which was a relatively indirect means supported by the search engine we used for the search solution.

Since moving employers, I now have responsibility for a local search solution as well as input on an enterprise search solution where neither of the search engines supports a similar model.  Instead, both support the (more typical?) model where you identify particular search terms that you feel need to have a best bet and you then need to identify a specific target (perhaps multiple targets) for those search terms.

This model offers some advantages such as specificity in the results and the ability to actively determine what search terms have a best bet that will show.

This model also offers some disadvantages, the primary one (in my mind) being that they must be managed – you must have a means to identify which terms should have best bets and which targets those terms should show as a best bet.  This implies some kind of manual management, which, in resource-constrained environments, can be a challenge.  As noted in my previous article, others have provided insight about how they have implemented and how they manage best bets.

Now having responsibility for a search solution requiring manual management of best bets, we’ve faced the same questions of governance and management and I thought I would share the governance model we’ve adopted.  I did review many of the previous writings on this to help shape these, so thanks to those who have written before on the topic!

Our governance model is largely based on trying to provide a framework for consistency and usability of our best bets.  We need some way to ensure we do not spend inordinate time on managing requests while also ensuring that we can identify new, valuable search terms and targets for best bets.

Without further ado, here is an overview of the governance we are using:

  • We will accept best bet requests from all users, though most requests come from site publishers on our portal.  Most of our best bets have web sites as targets, though about 30% have individual pieces of published content (documents) as targets.  As managers of the search solution, my team will also identify best bets when appropriate.
  • When we receive a request for a new best bet, we review the request against the following the following criteria:
    • No more than five targets can be identified for any one search term, though we prefer to keep it to one or two targets.
      • Any request for a best bet that would result in more than 2 targets for the search term forces a review of usage of the targets (usage is measured by our web analytics solution for both sites and published content).
      • The overall usage of the targets will identify if one or more targets should be dropped.
    • For a given target, no more than 20 individual search terms can be identified.  Typically, we try to keep this to fewer than 5 when possible.
    • If a target is identified as a best bet target that has not had a best bet search term associated with it previously, we confirm that it is either a highly used piece of content or that it is a significant new piece that is highly known or publicized (or may soon be by way of some type of marketing).
    • We also review the search terms identified for the best bet.  We will not use search terms with little to no usage during the previous 3 months.
    • We will not set up a best bet search term that matches the title of the target.  The relevancy algorithm for our search engine heavily weights titles, so this is not necessary.
    • We do prefer that the best bet search terms do have a logical connection to the title or summary of the target.  This ensures that a user will understand the connection between their search terms and a resulting best bet.  This is not a hard requirement, but a preference.  We do allow for spelling variants, synonyms, pluralized forms, etc.
    • We prefer terms that use words from our global taxonomy.
  • Our governance (management process, really) for managing best bets includes:.
    • Our search analyst reviews the usage of each best bet term.
      • If usage over an extended time is too low to warrant the best bet term, it is removed.
    • We also plan to use path analysis (pending some enhancements needed as this is written) to determine if, for specific terms, the best bet selections are used preferentially.  If that is found to not be the case, our intent is that the best bet target is removed.
    • We have integrated the best bet management into both our site life cycle process and our content life cycle
      • With the first, when we are retiring a site or changing the URL of a site we know to remove or update the best bet target
      • With the second, as content is retired, the best bets are removed
      • In each of these cases, we also evaluate the terms to see if there could be other good targets to use.

The one interesting experience we’ve had so far with this governance model is that we get a lot of push back from site publishers who want to provide a lengthy laundry list of terms for their site, even when 75% of that list is never used (or at least in a twelve month period we’ll sometimes check).  They seem convinced that there is value in setting up best bets for terms even when you can show that there is none.  We are currently making changes in the way we manage best bets and also in how we can use these desirable terms to enhance the organic results directly.  More on that later.

There you have our current governance model.  Not too fancy or complicated and still not ideal, but it’s working for us and we recognize that it’s a work in progress.

Now that I have the “monkey off my back” in terms of getting a new post published, I plan to re-start regular writing.  Check back soon for more on search, content management and taxonomy!

February 22nd, 2010

Yes, I am Alive!

It’s now been slightly over a year since my last post.  Yikes, time flies!

Where have I been?  Well, mostly, I’ve been getting myself comfortable in a new position.  Tomorrow marks my first anniversary as an employee of Deloitte.  I took on the role of Portal Program Lead for the Global Consulting Knowledge Management (GCKM) group.  This position carries a lot of the same responsibilities I had with Novell – I manage a small team of great people taking care of a portal and its associated search solution, its taxonomy and also interacting with and supporting a lot of people within the GCKM group and in the practice at large.

One big difference is that it’s not the “enterprise” portal and search solution I used to manage – instead, it’s the knowledge management portal specifically for Deloitte’s consulting organization.  Another big difference is that it’s a portal targeted at an audience that is many times larger than Novell’s employee population.  So a relatively small scope but a relatively much larger audience.  Definitely a much difference organization then the one I had been used to!

It’s been an interesting year – learning a lot and getting used to a new organization, new technical solutions, new business problems, and more.

This is probably as much rationalization as anything but I’ve been feeling like I wanted to wait until I had been at this new position long enough to begin writing again.  As in the past, I will mostly write about my own perceptions and learnings and thinking based on my experiences.  Given that I did not feel comfortable writing about what I had learned at Novell after I left, while at the same time, I did not feel like I had enough grounding at Deloitte to write about that experience, I have been in a holding pattern.

With tomorrow being my one year anniversary, I figure it’s time to jump back in. As always, opinions expressed here are mine and do not imply any belief on the part of my employer.

My first post back will be on Best Bet Governance.  Look for it shortly!

February 10th, 2009

Embedding Knowledge Sharing in Performance Management

In my last post, I wrote about a particular process for capturing “knowledge nuggets” from a community’s on-going discussions and toward the end of that write up, I described some ideas for the motivation for members to be involved in this knowledge capture process and how it might translate to an enterprise. All of the ideas I wrote about were pretty general and as I considered it, it occurred to me that another topic is – what are the kinds of specific objectives an employee could be given that would (hopefully) increase knowledge sharing in an enterprise? What can a manager (or, more generally, a company) do to give employees an incentive to share knowledge?

Instead of approaching this from the perspective of what motivates participants, I am going to write about some concrete ideas that can be used to measure how much knowledge sharing is going on in your organization. Ultimately, a company needs to build into its culture and values an expectation of knowledge sharing and management in order to have a long-lasting impact. I would think of the more tactical and concrete ideas here as a way to bootstrap an organization into the mindset of knowledge sharing.

A few caveats: First – Given that these are concrete and measurable, they can be “gamed” like anything else that can be measured. I’ve always thought measures like this need to be part of an overall discussion between a manager and an employee about what the employee is doing to share knowledge and not (necessarily) used as absolute truth.

Second – A knowledge sharing culture is much more than numbers – it’s a set of expectations that employees hold of themselves and others; it’s a set of norms that people follow. That being said, I do believe that it is possible to use some aspects of concrete numbers to understand impacts of knowledge management initiatives and to understand how much the expectations and norms are “taking hold” in the culture of your organization. Said another way – measurement is not the goal but if you can not measure something, how do you know its value?

Third – I, again, need to reference the excellent guide, “How to use KPIs in Knowledge Management” by Patrick Lambe. He provides a very exhaustive list of things to measure, but his guide is primarily written as ways to measure the KM program. Here I am trying to personalize it down to an individual employee and setting that employee’s objectives related to knowledge sharing.

In the rest of this post, I’ll make the assumption that your organization has a performance management program and that that program includes the definition for employees of objectives they need to complete during a specific time period. The ideas below are applicable in that context.

  • Community membership – Assuming your community program has a way to track community membership, being a member of relative communities can be a simple objective to accomplish.
  • Community activity – Assuming you have tools to track activity by members of communities, this can give you a way to set objectives related to being active within a community (which I think is much more valuable than simply being a member). It’s hard to set specific objectives for this type of thing, but the objective could simply be – “Be an active member of relevant communities”. Some examples
    • If your communities use mailing lists, you can measure posts to community mailing lists.
    • If your communities use an collaboration tool, such as a wiki or blog or perhaps shared spaces, measure contributions to those tools.
    • If your communities manage community-based projects, measure involvement in those projects – tasks,deliverables, etc.
    • Assuming your communities hold events (in-person meetings, webcasts, etc.), measure participation in those events.
  • Contribution in a corporate knowledge base – An obvious suggestion. Assuming your organization has a knowledge base (perhaps multiples?), you can set expectations for your employee’s contributions to these.
    • Measure contributions to a document management system. More specifically, measure usage of contributions as well.
    • If your organization provides product support of any sort, measure contributions to your product support knowledge base
    • If you have a corporate wiki, measure contributions to the corporate wiki
    • If you have a corporate blog, measure posts and comments on the corporate blog
    • Measure publications to the corporate intranet
    • In your services organization (if you have one), measure contributions of deliverables to your clients. Especially ones of high re-use value.
    • Measure relevance or currency of previously contributed content – Does an employee keep their contributions up to date?
  • A much different aspect of a knowledge sharing culture is to also capture a when employees look for knowledge contributed by others – that is, the focus can not simply be on how much output an employee generates but also on how effective an employee is in re-using the knowledge of others.
    • This one is harder for me to get my head around because, as hard as it can be to assign any credible value to the measurements listed above, it’s harder to measure the value someone gets out of received knowledge.
    • Some ideas…
    • Include a specific objective related to receiving formalized training – while a KM program might focus on less formal ways to share knowledge, there’s nothing wrong with this simple idea.
    • If your knowledge management tools support it, measure usage by each employee of knowledge assets – do they download relevant documents? Read relevant wiki articles or blog posts?
    • Measure individual usage of search tools – at least get an indication of when an employee first looks for assets instead of re-inventing the wheel.

Not all of these will apply to all employees and some employees may not have any specific, measurable knowledge sharing objectives (though that seems hard to imagine regardless of the job). An organization should look at what they want to accomplish, what their tool set will support (or what they’re willing to enhance to get their tool set to support what they want) and then be specific with each employee. This is meant only as a set of ideas or suggestions to consider in making knowledge sharing an explicit, concrete and measurable activity for your employees.

Rolling Up Objectives

Given some concrete objectives to measure employees with, it seems relatively simply to roll those objectives up to management to measure (and set expectations for up front) knowledge sharing by a team of employees, not just individual employees. On the other hand, a forward-thinking organization will define group-level objectives which can be cascaded down to individual employees.

Given either of these approaches, a manager (or director, VP, etc.) may then have both an organizational level objective and their own individual objectives related to knowledge sharing.

Knowledge Sharing Index

Lastly – while I’ve never explored this, several years ago, a vice president at my company asked for a single index of knowledge sharing. I would make the analogy of some like a stock index – a mathematical combination of measuring different aspects of knowledge sharing within the company. A single number that somehow denotes how much knowledge sharing is going on.

I don’t seriously think this could be meaningful but it’s an interesting idea to explore. Here are some definitions I’ll use to do so:

  • You would need to identify your set of knowledge sharing activities to measure – Call these A1, … , An. Note that these measurements do not need to really measure “activity”. Some might measure, say, the number of members in your communities at a particular time or the number of users of a particular knowledge base during a time period.
  • Define how you measure knowledge sharing for A1, … , An – for a given time t, the measurement of activity Ai is Mt,i
  • You then need to define a starting point for measurement – perhaps a specific date (or week or month or whatever is appropriate) whose level of activity represents the baseline for measurement. Call these B1, …, Bn – basically, Bi is M0,i
  • Assuming you have multiple types of activity to measure, you need to assign a weight to each type of activity that is measured – how much impact does change in each type of activity have on the overall measurement? Call these W1, …. Wn.

Given the above, you could imagine the “knowledge sharing index” at any moment in time could be computed as (for – I don’t know how to make this look like a “real” formula!):

Knowledge index at time t = Sum (i=1…N) of Wi * ( Mt,i / Bi )

A specific example:

  1. Let’s say you have three sources of “knowledge sharing” – a corporate wiki, a mailing list server and a corporate knowledge base
  2. For the wiki, you’ll measure total edits every week, for the list server, you’ll measure total posts to all mailing lists on it and for the knowledge base, you’ll measure contributions and downloads (as two measures).
  3. In terms of weights, you want to give the mailing lists the least weight, the wiki an intermediate weight and the combined knowledge base the most weight. Let’s say the weights are 15 for the mailing lists, 25 for the wiki, 25 for the downloads from the knowledge base and 35 for contributions to the knowledge base. (So the weights total to 100!)
  4. Your baseline for future measurement is 200 edits in the wiki, 150 posts to the list server, 25 contributions to the knowledge base and downloads of 2,000 from the knowledge base
  5. At some week after the start, you take a measurement and find 180 wiki edits, 160 posts to the list server, 22 knowledge base contributions and 2200 downloads from the knowledge base.
  6. The knowledge sharing index for that week would be 95.8. This is “down” even though most measures are up (which simply reflects the relative importance of one factor, which is down).

If I were to actually try something like this, I would pick the values of Wi so that the baseline measurement (when t= 0) comes to a nice round value – 100 or something. You can then imagine reporting something like, “Well, knowledge sharing for this month is at 110!” Or, “Knowledge sharing for this month has fallen from 108 to 92″. If nothing else, I find it amusing to think so concretely in terms of “how much” knowledge sharing is going on in an organization.

There are some obvious complexities in this idea that I don’t have good answers for:

  1. How to manage a new means to measure activity becoming available? For example, your company implements a new collaboration solution. Do you add it in as a new factor with its weight and just have to know that at some point there’s a step function of change in the measure that doesn’t mean anything except for this new addition? Do you try to retroactively adjust weights of sources already included to keep the metrics “smooth”?
  2. How to handle retiring a source of activity? For example, you retire that aging (but maybe still used extensively) mailing list server. Same question as above, though perhaps simpler – you could just retroactively remove measurements from the now-retired source to keep a smooth picture.
  3. How to handle (or do you care to handle?) a growing or shrinking population of knowledge workers? Do you care if your metric goes up because you acquired a new company (for example) or do you need to normalize it to be independent of the number of workers involved?

In any event – I think this is an interesting, if academic, discussion and would be interested in others’ thoughts on either individual performance management or the idea of a knowledge sharing index.

February 5th, 2009

Retiring a Community and Capturing its Knowledge

Recently, there was a thread of discussion on the com-Prac list about the “death of a community” and a follow-up discussion about what or how CoPs should capture discussion-produced knowledge.

I found these to be very interesting and thought-provoking discussions. In this post, I will write about two aspects of these discussions – the retiring of a community and also a case study in how a community centered around a mailing list meets the challenge of knowledge capture.

Before getting into the details – I wanted to (re-)state that I recognize that a community is (much) more than a mailing list – community members interact in many ways, some online, some in “real space”. That being said, I also know that for many communities the tool of choice for group communication is a mailing list, so in this post, I will write about issues related to the use of mailing lists, though the ideas can be transferred to other means of electronic exchange. As John D. Smith notes in the second thread:

“All of the discussion about summarization so far assumes that a community almost exclusively lives on one platform. As Nancy alluded to, I think the reality is quite a bit more messy. Note the private emails between Eric and Miguel that were mentioned in this thread. We ourselves interact in LOTS of different locations.”

In other words, even if you could solve the knowledge capture challenge for one mode of discussion (mailing lists) you are still likely missing out on a lot of the learning and knowledge sharing going on in the community. Keep that in mind!

Retiring a Community, or at least a community’s mailing list

As I’ve written about before, within the context of my current employer’s community program, mailing lists, and their related archives are an important part of our community of practice initiative (and, by extension our KM program). We have not developed a formal means to retire (or “execute” in the terms used in the first thread mentioned above) a community, but we do have a formal process for retiring mailing lists. While the following is about mailing lists, I think the concepts can scale up to any community – though it might require aggregating similar insights about other channels used by the community.

Within our infrastructure, many of the existing mailing lists are associated with one (or more) communities and we provide a simple means for anyone to request a new mailing list. There is a very light review process, primarily focused on ensuring that the requested list is different enough from existing lists and also doesn’t have such a small topic space that it will likely be very under-utilized), which means that over time we can end up with a lot of mailing lists. Without some regular house-cleaning, this situation can have a very negative impact on how a user’s discovery process – hundreds and hundreds of mailing lists means a lot of confusion.

One way we grapple with this is to use the communities as a categorization of mailing lists. Instead of leaving a user with hundreds of mailing lists to wade through, we encourage them to look for a community in which they’re interested and, through that community, find associated mailing lists. This normally reduces the number of mailing lists to consider down to a small handful.

However, we still have needed a house-cleaning process, so several years ago, this is what we set up:

  • All mailing lists are reviewed on a periodic basis – usually around once every six to twelve months.
  • When reviewed, the following criteria are used to identify candidates for retirement
    • Age of the list (it must be a certain age in order to give new lists time to “get off their feet”)
    • New subscriptions to the list (if someone newly joins what is otherwise an un-utilized list, that represents at least *potential* utilization in the future – so no need to shut it off)
    • Posting activity on the list (if a list is old enough and has not had anyone newly join and has not had any activity in a specific span of time, it becomes a candidate). Note that even a single post removes the list from candidacy (we do not attempt to quantify the value of a post or anything like that).
  • Once a list of candidate mailing lists is identified, the moderators for that list are contacted and asked if the list is needed
    • If a list has no identified moderators or (more commonly) the moderators of record are no longer with the company, the entire list of members are contacted (via an email sent directly to the members, not via the mailing list itself as that introduces the “one” post that then keeps the list “alive” in the next review).
  • Regardless of who the question is asked of, the contact with the list is positioned as a proposal to retire the list and people only need to reply if they do not align with that proposal; a target date for reply is also provided (no reply by that date is taken as alignment with retiring the list).
    • Replies saying, “Go ahead and retire” do nothing except confirm the proposal.
    • However, even one reply requesting retention of the list takes the list off the list of retirement candidates – that is, everyone has the same weight to veto the retirement.
    • As for the archives of the list, we also state that the archives will be retained even if the list is retired unless a moderator states that the archives are not needed. (The archives are included in our enterprise search, so they remain as a potential knowledge source even if the list does not have continued value in supporting on-going discussions.)
  • Assuming a list is not removed from the candidate list (i.e., it can be retired), the remaining process is simply to remove it from the list server – I won’t bore you with the details of that here.

In our environment, doing this once a year typically reduces the count of lists by about 10% – though the count of lists has remained remarkably stable over time, which would say that we then have that same kind of growth over the next year. On the other hand, if we did not proactively review and retire lists like this, we would be seeing an ever-growing list of mailing lists, making it harder for everyone to find the lists that are engendering valuable discussions.

Knowledge Capture

Or… How to lift knowledge out of the on-going discussion of a community into a better form of reusability.

If a community uses a tool like a mailing list to engender discussion and knowledge sharing – how does a community capture “nuggets” of knowledge from the discussion into a more easily digestible form? Does the community need to (perhaps not given a sophisticated enough means to find information in the archives)?

I have no magic solution to this problem but I did find another comment to be very illustrative of one aspect of the original discussion – who “owns” the archives of a community’s discussion and what is the value of those archives? Even in their raw form, why do those archives have value? As Nancy White notes:

“I suspect that only a small percentage of the members (over time) would actually use the archives. But because they hold the words of members, there may be both individual and collective sense of ownership that have little to do with “utility.”"

The rest of this post will be a brief description of a knowledge capture process I’m very familiar with – though I’m not sure if it will transfer well into other domains. For this description, I’m going completely outside of the enterprise and to a community of which I’m a member that revolves around a table-top fantasy war game named Warhammer.

A bit of background: Warhammer is a rather complex game, with a rulebook that weighs in at several hundred pages and about a dozen additional books that provides details on the various types of armies players can use. All told, probably something like 1,000 pages describing the rules and background of the game. Given the complexity of the game, it is very common that during any given game, the players will run into situations not covered well by the rules – these are usually areas involving interactions of special rules for the armies playing. In the many online forums / mailing lists that exist, one of the most frequent types of discussions revolves around these situations and how to interpret the rules. Many of the same questions come up repeatedly – obvious fodder for an FAQ.

(As an aside, given that Warhammer is published and sold by a company – Games Workshop – one could that they should publish all of the relevant FAQs. They do publish FAQs and errata but they do so at a sporadic pace at best and do not address many of the frequently asked questions.)

One particular Warhammer-related community of which I’m a member – the Direwolf (DW) community – has established a pretty well defined means to gather these FAQs and publish them back to the Warhammer community at large. A brief overview of the process:

  • A subset of the community is elected by the community each year to act as the FAQ council. This group normally includes one person responsible for questions related to the main rule book, one for each specific army and one person who’s responsible for maintaining the FAQ documents themselves (so all totaled, about 15 people).  [As another aside, I happen to be a member of this FAQ council currently, which is how I'm familiar with the process it uses.]
  • Each member of the group is responsible for monitoring discussions within the community’s mailing list related to their specific area of focus and bringing those questions to the FAQ council for consideration when they are believed to be “frequently asked” enough to warrant inclusion.
    • In addition, the council actively solicits questions specific to individual armies when a new book comes out for an army – this solicitation includes both members of the DW community and also a few other highly populated Warhammer-related communities.
  • Once a question (or set of questions) is identified for the FAQ council, the group discusses (in a mailing list available just to FAQ council members) potential answers and comes to a consensus (or at least a majority) on the answer.
    • Most commonly, the group will agree on an interpretation but occasionally, explicit polling is done to ensure at least a majority of the group agrees with an interpretation.
  • The FAQ documentation is then updated to include the relevant questions and answers and then are published on the internet and made available to anyone who plays the game.

Netting it out: A community-selected subset of the community monitors the community for questions in their area of expertise, vettes an answer with the rest of the FAQ council, and then the FAQ documentation is updated as appropriate.

This is pretty straightforward, but the value of this effort is reflected in the fact that the game publisher now very commonly uses input from the Direwolf FAQ council in considering their own responses to FAQs and also in the fact that many players from around the world use the Direwolf FAQ to ensure a consistent interpretation of those “fuzzy” areas of the game. A true value add for the Warhammer community at large.

That being said, this process does take quite a bit of energy and commitment, especially on the part of the “keeper” of the documentation, to keep things up to date. In this case, I believe that the value-add for members of the council is knowing that they are contributing to the Warhammer community at large and also knowing that they are helping themselves in their own engagement of playing the game.

How does this translate into a community of practice within an enterprise?

  • It’s possible that an exact parallel of the above could work in many communities.
  • Even if the position isn’t “elected”, some type of rotating responsibility among community members to monitor and gather FAQs (or other knowledge artifacts) could be very valuable for both the community and the member(s) who perform the job.
    • Within an enterprise that seems like an approach that will have longer legs than having a community manager (someone who helps facilitate the community but who might otherwise not have a strong vested interest in the domain of the community) responsible for this.
  • Ensuring that community members do perceive value in their involvement in the process is going to be a key component – What’s in it for them? The answer could be any number of things
    • Professional development opportunities (learning a lot more about areas in which they don’t normally work)
    • Visibility to other members of the community / career growth opportunities
    • Helping themselves be more successful in their own job (they are ensuring there is a source of gathered knowledge to be used)