Lee Romero

On Content, Collaboration and Findability

Archive for November, 2008

Additional Community Metrics

Tuesday, November 25th, 2008

My last several posts have been focused on various aspects of community metrics – primarily those derived from the use of a particular tool (mailing lists) used within our communities. While quite fruitful from an analysis perspective, these are not the only metrics we’ve looked at or reported on. In this post, I’ll provide some insights on other metrics we’ve used in case they might be of interest.

Before going on, though, I also wanted to highlight what I’ve found to be an extremely thorough and useful guide covering KPIs for knowledge management from a far more general perspective than just communities – How to Use KPIs in Knowledge Management by Patrick Lambe. I would highly recommend that anyone interested in measuring and evaluating a knowledge management program (or a community of practice initiative specifically) read this document for an excellent overview for a variety of areas. Go ahead… I’ll wait.

OK – Now that you’ve read a very thorough list, I will also direct you to Miguel Cornejo Castro’s blog, who has published on community metrics. I know I’ve seen his paper on this before, but in digging just now I could not seem to come up with a link to it. Hopefully, someone can provide a pointer.

UPDATE:  Miguel was kind enough to provide the link to the paper I was recalling in my mention above: The Macuarium Set of CoP Measurements.  Thanks, Miguel!

If you can provide pointers to additional papers or writings on metrics, please comment here or on the com-prac list.

With that aside, here are some of the additional metrics we’ve used in the past (when we were reporting regularly on the entire program, it was generally done quarterly to give you an idea of the span we looked at each time we assembled this):

  • Usage of intranet-based web sites – specifically, site visits and hits on a community’s site as track by our web analytics solution;
  • Intellectual assets produced – specifically, tracking those produced (or significantly updated) and published via one of our repositories;
  • Number of “anecdotes” captured for community members – that is, the one-off “pats on the back” that community members receive – this attempted to capture some of the softer aspects of community value;
  • Number of knowledge share events held – many communities commonly host virtual events (using one of several different webcasting tools) and we tracked those as well as any in-person events;
  • Attendance at community knowledge share events and playback of recordings of webcasts – an attempt to capture how impactful the events were on members;
  • White papers produced – a specific drill into the intellectual assets;
  • For most of these, we also provided insights on quarter-to-quarter change within communities and for the community of practice program overall to give community sponsors / leaders insight on which direction things were moving;
  • We also looked at our corporate wiki for some insights on a couple levels:
    • Using our community member lists, we knew who was a member of a community, so we could analyze content authoring within the wiki by that same group; this provided insight on how much community members contributed to this knowledge base;
    • Within our corporate wiki, authors have the ability to assign articles to categories; one set of such categories were the communities, so we reported on authoring activity and usage of wiki articles that were assigned a category corresponding to one of the communities of practice; this provided insight on the utility and interest in knowledge associated with the communites.
  • And, finally, we also reported another “softer” piece of data, which was to allow the communities themselves to highlight specific events, results, or issues for the communities.

This is my last planned post on community metrics for now. I will likely return to the topic in the future. I hope the posts have been interesting and also have provided food for thought for your own community programs or efforts.

Visualizing Knowledge Flow in a Community

Friday, November 21st, 2008

In my last post, I described some ideas about how to get a sense of knowledge flow within a community using some basic metrics data you can collect. I thought it might be useful to provide a more active visualization of the data from a sample community. As always, data has been obfuscated a bit here but the underlying numbers are most accurate – I believe it provides a more compelling “story” of sorts to see data that at least approximates reality.

I knew that Google had provided its own visualization API which provides quite a lot of ways to visualize data, including a “Motion Chart” – which I’d seen in action before and found a fascinating way to present data. So I set about trying to determine a way to use that type of visualization with the metrics I’ve written about here.

The following is the outcome of a first cut at this (requires Flash):

This visualization shows each of the lists associated with a particular community as a circle (if you hover over a circle, you’ll see a pop-up showing that list’s name – you can click on it to have that persist and play with the “Trails” option as well to see the path persist).

The default options should have “Cumulative Usage” on the Y axis, Members on the X axis, “Active Members” as the color and “Usage” as the size.

An interpretation of what you’re seeing – once you push play, lists will move up the Y axis as their total “knowledge flow” grows over time. They’ll move right and left as their membership grows / shrinks. The size of a circle reflects the “flow” at that time – so a large circle also means the circle will move up the Y axis.

It’s interesting to see how a list’s impact changes over time – if you watch the list titled “List 9” (which appears about Sept 05 in the playback), you’ll see it has an initial surge and then its impact just sort of pulsates over the next few years. Its final position is higher up than “List 7” (which is present since the start) but you can see that List 7 does see some impact later in the playback.

You can also modify which values show in which part of this visualization – if you try some other options and can produce something more insightful, please let me know!

I may spend some time looking at the other visualization tools available in the Google Visualization API and see if they might provide value in visualization other types of metrics we’ve gathered over time. If I find something interesting, I’ll post back here.

Measuring Knowledge Flow within a Community of Practice

Thursday, November 20th, 2008

In my series on metrics about communities of practice, I’ve covered a pretty broad range of topics, including measuring, understanding and acting on:

In this post, I’ll slightly change gears and present some thoughts on a more research-like use of this data. First, an introduction to what drove this thinking.

“Why do we need to provide navigation to communities? There’s nothing going in them anyway!”

A few years back as we were considering some changes in the navigational architecture on our intranet, I heard the above statment and it made me scratch my head. What did this person mean – there is nothing going on in communities? There sure seemed to be a lot of activity that I could see!

A quick bit of background: Though I have not discussed much about our community program outside of the mailing lists, every community had other resources that they utilized – one of the most common being a site on our intranet. On top of that, at the time of the discussion mentioned above, communities actually had a top spot in the global navigation on our intranet – which provided the typical menu-style navigation to top resources employees needed. One of the top-level menus was labeled “communities” and as sub-menu items, it included subset of the most strategic / active communities. Very nice and direct way to guide employees to these sites (and through them to the other resources available to community members like the mailing lists I’ve discussed).

Back to the discussion at hand – As we were revisiting the navigational architecture, one of the inputs was usage of the various destinations that made up the global navigation. We have a good web analytics solution in place on our intranet (the same we use on our public site) so we had some good insight on usage and I could not argue the point – the intranet sites for the communities simply did not get much traffic.

As I considered this, a thought occurred to me – what we were missing is that we had two distinct ways of viewing “usage” or “activity” (web site usage and mailing list membership / activity) and we were unable to merge them. An immediate question occurred to me – what if, instead of a mailing list tool, we used an online forum tool of some sort (say, phpBB or something similar)? Wouldn’t that merge together these two factors? The act of posting to a forum or reading forums immediately becomes different web-based activities that we could measure, right?

Given the history of mailing list usage within the company, I was not ready to seriously propose that kind of change, but I did set out to try to answer the question – Can we somehow compare mailing list activity to web site usage to be able to merge together this data?

The rest of this post will discuss how I went about this and present some of the details behind what I found.

The Basic Components

The starting point for my thinking was that the rough analogy to make between web sites and mailing lists is that a single post to a mailing list could be thought of as equivalent to a web page. The argument I would make is that (of course, depending on the software used), for a visitor to read a single post using an online forum tool, they would have to visit the page displaying that post. So our first component is

Pc = the number of posts during a given time period for a community

In reality, many tools will combine together a thread into a single page (or, at least, fewer than one page per comment). If you make an assumption that within a community, there’s likely an average number of posts per thread, we could define a constant representing that ratio. So, define:

Rc = the ratio of posts per thread within a community for a given time period

Note that while I did not discuss it in the context of the review of activity metrics, it’s possible with the activity data we are gathering to identify thread and so we can compute Rc.

Tc = total threads within a community for a given time period

Rc = Pc / Tc

Now, how do we make an estimate of how many page views members would generate if they visited the forum instead of having posts show up in their mailbox? The first (rough, and quite poor) guess would be that every member would read every post. This is not realistic and to get an accurate answer would likely require some analysis directly with community members. That being said, I think, within a constant factor, the number of readers can be approximated by the number of active members within the community (it’s true that any active member can be assumed to have read at least some of the posts – their own). A couple more definitions, then:

Mc = the number of members of a community at a given time

Ac = the number of active members within a community for a given time period

In addition to assuming that active members represent a high percentage of readers, I wanted to reflect the readership (which is likely lower) among non-active members (AKA “lurkers”). We know the number of lurkers for a given time period is:

Lc = the number of lurkers within a community over a given time period = (Mc – Ac)

So we can define a factor representing the readership of these lurkers

PRc = the percent of lurkers who would read posts during a given time period (PR means “passive reader”)

Can we approximate PRc for a community from data we are already capturing? At the (fuzzy) level of this argument, I would think that the percentage of active to total members probably is echoed within the lurker community to estimate the number of lurkers who will read any given post in detail:

PRc ~= Ac / Mc

The Formula

So, with the basic components defined above, the formula that I have worked out for computing a proxy for web site traffic from mailing lists becomes:

Uc = the “usage” of a community as reflected through its mailing list

= Pc * (Ac + PRc * Lc) / Rc

= Pc * (Ac + Ac / Mc * Lc) / Rc

= Pc * (Ac + Ac / Mc * (Mc – Ac)) / Rc

= (2 * Pc * Ac – Pc * Ac2 / Mc ) / (Pc / Tc)

= (2 * Ac * Tc – Ac2 * Tc / Mc)

So with that, we have a formula which can help us relate mailing list activity to web site usage (up to some perhaps over-reaching simplifications, I’ll admit!). All of these factors are measurable from the data we are collecting and so I’ll provide a couple of sample charts in the next section.

Some Samples

Here are a few samples of measuring this “usage” over a series of quarters in various communities.

As you will see in the samples, this metric shows a wide variance in values between communities, but relative stability of values within a community.

Small Community Usage Metric

Small Community Usage Metric

The first sample shows data for a small community. As before, I have obfuscated the data a bit, but you can see a bit jump early in the lifecycle and then an extended period of low-level usage. The spike represents the formal “launch” of the community, when a first communication went out to potential members and many people joined. The drop-off to low level usage shown here represents, I believe, a challenge for the community to address and to make the community more vital (of course, it could also be that other ways of observing “usage” of the community might expose that it actually is very vital).

The second sample shows data for a large, stable community – you’ll note that the computed value for “usage” is significantly higher here than in the above sample (in the range of around 30,000-40,000 as opposed to a range of 500-1,000 as the small community stabilized around).

Large Community

Large Community

How does this relate to the title of this post?

Well, after putting the above together, I realized that if you ignore the Rc factor (which converts the measurement of these “member-posts” into a figure purportedly comparable to web page views), you get a number that represents how much of an impact the flow of content through a mailing list has on its members – indirectly, a measure of how much information or knowledge could be passing through a community’s members.

The end result calculation would look something like:

Kc = the knowledge flow within a community for a given period

= (2 * Pc * Ac – Pc * Ac2 / Mc )

This concept depends on making the (giant) leap that the “knowledge content” of a post is equivalent across all posts, which is obviously not true. For the intellectual argument, though, one could introduce a factor that could be measured for each post and replace Pc (which has the effect of treating the knowledge content of a post as “1”) with the sum of that evaluation of each post across a community (where each post is scored a 0-1 on a scale representing that post’s “knowledge content”).

I have not done that analysis, however (it would be a very subjective and manually intensive task!), and, within an approximation that’s probably no less accurate than all of the assumptions above (said with appropriate tongue-in-cheek), I would say that one could argue that you could multiply Kc by a constant factor (representing the average knowledge content of a community) and have the same effect.

Further, if you use this calculation primarily to compare a community with itself over time, you likely find that the constant factor likely does not change over time and you can simply remove it from the calculation (again, with the qualifier that you can then only compare a community to itself!) and you are left with the above definition of Kc.

Validating this Analysis

So far, I’ve provided a fairly complicated description of this compound metric and a couple of sample charts that show this metric for a couple of sample communities. Some obvious questions you might be asking:

  • What’s the value in this metric? Is it actionable?
  • How valid is this metric in the sense of really reflecting “usage” (much less any sense of “knowledge flow”)?

To be honest, so far, I have not been very successful in answering these questions. In terms of being actionable – using this data might lend itself to the types of actions you take based on web analytics, however, there is not an obvious (to me) analog to the conversion that is a fundamental component of web analytics. It seems more likely an after-the-fact measure of what happened instead of a forward-looking tool that can help a community manager or community leader focus the community.

In terms of validity, I’m not sure how to go about measuring if this metric if “valid”. Some ideas that come to my mind at least to compare this to include:

  • Comparing this metric to the actual usage of a community’s web site (via our web analytics tool); do they correlate in some way?
  • Comparing this compound metric to the simpler metric of posts to the community’s mailing lists – how do these compare and why does (or does not) this compound metric provide any better insight?
  • Taking a different approach to this formula – I think understanding how this metric changes as you hold some parts constant and change others would help understand what it “means”.
    • For example, if membership and posts remain the same, but the # of different posters changes, what happens?
    • If posts active members change but total membership changes, what happens?

I’d be very happy to hear from someone who might have some thoughts on how to validate this metric or (perhaps even better) poke holes in what its failings are.

Summing Up

Whew! If you’re still with me, you are a brave or stubborn soul! A few thoughts on all of this to summarize:

  • I do believe that this type of analysis could be useful to understand the flow through a community over time; I think it needs significantly more research to get to a better formula, though the outline above could be a starting point;
  • I have not been able to really validate the ideas expressed here in any way except intuitively, so take with an appropriate grain of salt;
  • I think this type of analysis could also be applied in a variety of other contexts – use of a community Wiki, use of a community blog, attendance at “physical space” meetings, attending virtual knowledge share events, use of community workspaces, etc.; I have not tried this, yet, though;
  • With that last comment in mind, I believe that a key idea here is that this type of compound metric provides an avenue to combine the measurement of knowledge sharing across all of a community’s avenues – raising the possibility of providing something like a “Dow Jones Index” for a community’s knowledge sharing – perhaps collapsing down to a single, measurable quantity that you can track over time.
    • And, yes, I do recognize that such a metric is, at best, on shaky ground and likely not really supportable. I raise this idea because I was once asked to generate a single “knowledge sharing index” that would cover the corporation and this type of analysis could lead in that direction. (For the record, when faced with that question, we resisted spending time

Community of Practice Metrics and Membership, Part 5 – Performance Management

Friday, November 14th, 2008

My recent posts have been quite long and detailed with examples in terms of how we have been able to understand and analyze community membership and activity for our community of practice initiative. This post is less focused on numbers and more focused on a particular use of this data in a more strategic manner.

Performance Management

Within my employer, we have a (probably pretty typical) performance management program intended to address both career development (a long term view – “what do you want to be when you grow up?”) and also performance (the shorter term view – “what have you done for me lately?”)

We also have an employee management portal (embedded in the larger intranet) where an employee could manage details about their job, work, etc., including recording their development goals (and efforts) and performance (objectives and work to achieve those).  Managers have a view of this that allows them to see their employees’ data.

Communities and Performance Management

As we worked to drive the communities initiative and adoption of communities of practice as a part of the corporate culture, one of the questions that commonly came up was, “How do these communities contribute to my performance? How can I communicate that to my manager?” That could be asked from the perspective of career development (how can my involvement in communities help me grow?) and also for performance (if I am involved in a community, how does it help me achieve my objectives that are used to measure my performance?)

These are all pretty easily answered, but in an objective sense, we found that managers had a challenge in talking with their employees about their involvement in communities and that part of that challenge was that managers did not necessarily “see” their employee’s community involvement (if they were not part of the same community).

Given that we now had our definition of a community member is and also what an active community member is, it seemed like we could provide some insight to managers from this data and embed that in the employee management portal.

As we were working through this, we found that there was going to be a new component added to the employee management portal labeled “My involvement”, which was intended to capture and display information about how the employee has been involved in the company at large – things like formal recognition they’ve received or recognition they’ve given to others (as part of our employee recognition program) or other ways in which they’ve been “involved”.

This seemed like a perfectly natural place in which we can expose insights to employees and their managers about an employee’s involvement in communities of practice!

So we had a place and the data – it became a simple matter of getting an enhancement into the queue for the employee management portal to expose the data there. It took a few months, but we managed to do that and now employees can view their own involvement and managers can view their employees’ involvement in our communities. The screenshot below shows the part of the employee management portal where an employee or manager can see this view (as with other images, I’ve obscured some of the details a bit here):

Community Involvement in Employee Management Portal

Community Involvement in Employee Management Portal

The Value?

So, what has been the value of this exposure? How has it been used?

While this helps to make some of the conversations between manager and employee about community involvement a bit more concrete, we do recognize that this is still a very partial picture of that involvement. There are many ways in which an employee can be involved in and add value to and learn from a community that goes beyond this simplistic data. (I’ll write more about this “partial picture” issue in a future post.)

That being said, providing this insight to managers has proved very valuable to engender discussions between a manager and an employee about the employee’s community involvement – what they have learned (how it has effected their career development) and also how it might have contributed to their performance. This discussion, by itself, has helped employees demonstrate their growth and value in ways that otherwise could have been a challenge.

For managers, this gives them insight into value their employees provide that otherwise would have been difficult to “see”.

For the community of practice program, this type of visibility has had an ancillary effect of encouraging more people to join communities as I suspect (though can not quantify) that some managers will ask employees about the communities of which they are a member and (more importantly in this regard) the ones in which they are not a member (but which they might be, either by work focus or interest).

Overall, simply including this insight builds an organizational expectation of involvement.

Community of Practice Metrics and Membership, Part 4

Thursday, November 13th, 2008

So, I’ve now provided some insight on some of the basic insights we can gain about community members within our communities of practice program and also the kinds of demographics we can look at.

In this post, I’ll go into how we have used some of the activity data we collect from mailing lists. In my mind, this is more valuable than the basic membership data because it represents someone looking for or sharing knowledge with their fellow community members and, from my experience, gets closer (but still is not quite “there”) to the idea of an engaged / core team member.

Basic Activity Data

As I mentioned in my post about how we collect the membership data, when we implemented the mechanism that enables us to get list members into a queryable data source, we also implemented a means to populate a queryable data source with data about individual posts to the mailing lists. The data model for this data source is pretty simple, but provides us with a lot of ability to analyze the data:

  • The basic community membership data is in three main tables: community, mailing_list and mailing_list_member. These are populated and maintained by the membership collection process.
  • In addition, we have a table for mailing_list_event that contains a record for each event that happens to a mailing list. The prime event that we’re interested in here is a “post”.
  • Each mailing_list_event includes the following data: the event type (“POST” in this case), the date/time of the event, the list the event is associated with, identification of the actor in the event, and, in the case of a “POST” event, the subject line of the post.
  • The “identification of the actor” for posts is slightly complicated in that it is not strictly necessary that the actor be a member of the list (so we can not tie the posts to members) and it might, conceivably, be a post from an email address that is not recognized in our corporate directory. The latter is rare, but it means we need to identify posts by the email address and, when possible (which is in 99%+ of the time) we also capture the unique employee identifier for the employee who posted it.

I provide the above details in order to provide some insight about the types of insights we can gain and, if you can think of types of analyses not included below, feel free to comment!

Active Members

Similar to the general question that drove our initial definition of a community member, we were asked by our CoP program sponsor and CoP leaders to come up with a definition of active member and to be able to provide some insights on active members. Given our definition of community member, our initial (admittedly, very, very myopic) definition of active community member is:

An active community member for a given time period is one that has posted at least one message to a mailing list associated with the community.

As I said, we know this is myopic, but it has some advantages: 1) it’s a clear, objective definition that requires no subjective assessment; and, 2) it’s easy to gather the data to support analysis of this with the tools we have.

Note that with this definition, it’s important to recognize that the concept of active member does have time sensitivity. Because someone was considered “active” a month ago, does not mean they are considered active this month. Because of this, we had to also make some decisions about a timeframe in which we considered someone active and generally looked at such things quarter by quarter.

Note also that there are no “shades of grey” with regard to being active – a member either is or is not active. We did not introduce a concept of some minimum number of posts that were required to be considered active. This would be something to consider, but the challenge is that the population and traffic in different communities varies so widely that it would be a challenge to come up with a defensible definition of “how many posts is enough” for any given community.

With that definition, we could then provide all of the same types of insights I’ve previously described for general membership, including:

  • Count of active members of the CoP program
  • Count of active members within each individual community
  • Demographics of active members (by geography, function, etc., and, again, we can look at the CoP program overall or on a community-by-community basis)
Active Members by Quarter

Active Members by Quarter

The accompanying chart here provides an example of the visualization we can apply to this data – in this case, it shows a few sample communities over a stretch of several quarters. Some of the insights to be gained from this (and why it shows so many quarters) include understanding how communities change over time. In the chart you will notice that Community 4 starts off relatively high and shows a steady decline. This represents a case where the community is likely on the decline from its usefulness and we can consider retiring it (or, at least, discontinuing any kind of formal support for it). By contrast, Community 3, while overall it shows many fewer active members, shows a period of growth (it launched just at the start of the period covered here) and then stabilizes. In absolute terms, it may not be large or active enough to warrant formal support but it can be intuited that it is as useful as it has been in the past.

Activity Over Time

Community Posts over Time

Community Posts over Time

Another way to view this data is to consider total posts instead of just distinct active members. In most communities, these numbers will likely trend with each other pretty closely but sometimes a community may have particular members that post a much higher percentage of total posts than others. The accompanying chart shows an example of a visualization of this type of data. In this case, Community 5 corresponds to Community 4 in the above chart and you can, again, see another representation of the decline in that community. As with active members, it’s another way to help focus (limited) resources on communities based on some objective measures.

You can trend this data over arbitrary time periods and also (if you wish, though we have not) break out the activity by the same types of demographics discussed previously.

Most Active Members

Another interesting perspective is to be able to find the most active members within a community. This is easy to do given our approach to what “activity” is. This insight can be used to help identify potential core team members for the community or even potential community leaders. It can also be used to provide some type of community-based recognition for contributors.

At this point, I have to add the disclaimer that not all posts are created equally – using the simple, raw data can be misleading. A member who posts often but (relatively) useless content is not likely to be recognized as a key contributor who might only post a few times in any given time period but whose posts are valuable, insightful, thoughtful contributions to the community’s knowledge base. So it’s important that this raw data be taken as only part of the overall picture.

Identification of Lurkers and Percentage of Lurkers

Another obvious use of this data is to be able to understand the population of “lurkers” as a percentage of overall membership. This is another one of those types of measures that I have been challenged to determine a specific actionable steps to take with, other than to understand and contrast how individual communities compare against the baseline of the overall CoP program and against themselves over time. I’m not sure what actions might be taken with the insight gained, however.

As an interesting aside, given our definition of “active”, the percentage of lurkers we see across our community program (which is the total-active-members-across-all-communities divided by total-community-members-across-all-communities) remains very stable at just about 50% over any 6 month period we use for measuring.

Connectivity within and between communities

Some of the more interesting types of analysis I’ve performed to try to understand the connectivity of community members is to look at the overlap of active members between communities (what I think of as connectivity between communities) and also connectivity within communities to try to understand flow of information.

In terms of connectivity across communities, the approach I took was similar to the discussion about looking at the Demographics by Community in my previous post – though restricting it to only active members for each community.

Within a community, I have looked a bit deeper and used the threading of the messages to connect people based on actually taking part in the same conversations. I also have used some network visualization tools to try to visualize the resulting data set and ended up with the following as an example. This diagram shows the active members of the community as nodes and two nodes are linked if they communicated with each other on at least one thread. The size of a node represents how active that member is (count of posts), and the color of the node represents the member’s organization. This particular image does not show it, but I also tried displaying the weight of the link between members to reflect how often the two members communicate.

The net of this was that it presented really another way to understand who are the primary connectors / contributors within the context of the community (or, more accurately, its mailing lists) which can be useful to identify potential core team members or community leaders. At the end of the day, it quantitatively did not provide a whole lot more than simply identifying most active members as they tend to be the same people connecting people together regardless. I have not at this point done any deep network analysis on this type of data but that could be done, obviously, and that would likely provide much more insight about the structure of the network this data presents.

One interesting use of this type of visualization, though, is just as a communication tool with people about the community program. It is very impactful to be able to show someone this type of visualization and be able to talk about how the communities provide a very fertile ground for exchange of information, innovation and connectivity and to be able to show in such a diagram that this connectivity straddles across organizational boundaries is a valuable tool.

A Community's Network

A Community's Network

Community of Practice Metrics and Membership, Part 3

Wednesday, November 12th, 2008

I previously shared our general strategy with regard to answering the question about who is a member of a community of practice and, given our answer, how we actually implemented a solution to support our understanding of community membership. In this post, I’ll be providing some more insights and ideas around how we’ve been able to use this data and how it has helped shape our community strategies. Hopefully at least the latter might be of use to you.

Demographics – By Organization and Geography

By defining community members as we have and, further, by taking care to connect those members to their larger identity in our corporate directory, one of the areas we found we could delve more deeply into was the demographics of community members: what organizations made up various communities and what geographies were represented.

By marrying membership to the corporate directory identities, we could link to cost centers of the members and, through those, to the larger organizational units and geographies containing the community members. The following shows a sample of the type of data we could review (I’ve generalized the labels to obscure some of the details here – I’m simply trying to illustrate the ideas here):

CoP Demographics by Geography

CoP Demographics by Geography

This first chart shows the progression through quarters of the percentage of the overall community population from each of the major geographies in which we operate. Similar to the ability to view the percentage of the company overall that was a member of a community, this analysis leads us to being able to understand who makes up the communities and, more specifically, allows us to target certain geographies if they are believed to be under-represented.

A few additional drill-ins we could support with this view was the ability to see this same demographic data for each individual CoP – so specific communities could find where they might need to target some education about the communities.

Because we could get total counts for geography headcount, we could normalize this data – so we could compute the percentage of each geography within the overall community population or within each individual community. This allows better comparison across geographies, not just across time within one geography and helps to focus even more on geographies that might need some attention.

The following shows a table with data showing a breakdown by functions over time. In this case, I’ve kept it tabular to show that this provides a multi-level breakdown, though, again, I have generalized labels.

CoP Demographics by Function

CoP Demographics by Function

This data does not support anything distinctly different in terms of actions that the demographics by geography does – it’s simply another way to slice the data to understand community members, both overall and by community.

Demographics by Community

Another approach we could use with this data was to ask the question – “How alike are the communities?” In other words, what is the demographic slicing of community when the slice is by community? This analysis starts to get into the “academic” mode as I was not sure at the time (and still am not) if there is anything actionable that can be done with this insight, but it is interesting to understand the overlap between communities. The following is a table that shows the slicing of 11 communities along the dimension of each of those communities.

CoP Demographics by Community

CoP Demographics by Community

Reading across a row of the table, the value shown is the percent of the community in that row that overlaps with the community in that column. The diagonal going from upper-left to lower-right is obviously 100% as each community exactly overlaps itself. The report also color-codes the table so that particularly high or low values show a bit more obviously.

In hindsight, I realized that this is really doing something like a network diagram of the community program, where the nodes are the communities and the weight of the links are the percent overlap.  I have not used that visualization with this data but it’d be an interesting way to understand your community program.

Spread Across Communities

A last example of a use for this data was effectively an inverse of the demographics by community – trying to understand how widely spread out the community members were themselves. In other words – how many people are members of exactly one community, how many are members of exactly 2 communities, etc. As with the demographics by community, I do not believe we ever found anything actionable from this insight, but it proved an interesting way to understand how widely people’s interests ranged across communities. The following table shows the data across two quarters for this view of community members. For each value along the X axis, it shows the # of people who were a member of exactly that many communities in each of the two quarters.

Given the relatively large jump between quarters of members who were a member of exactly one community, my guess (without diving into the details) would be that this likely represents a targeted campaign in a new community to gather members among a group who previously had not been involved in a community. While we never performed it, it would be interesting to correlate how / if people would tend to “spread out” among communities over time – perhaps joining one and, finding value in their participation, they join additional ones (in other words, people may migrate a bit to the right in this diagram over time).

Spread of Members Across Communities

Spread of Members Across Communities

Community of Practice Metrics and Membership, Part 2

Tuesday, November 11th, 2008

In my last post, I described an effort to define “community membership” and how we came upon the definition:

The membership of a community would be defined by the membership in the related mailing list(s).

In this post, I will provide some of the work we did to effect this definition and also some of the basic analysis and insights this supported.

Collecting Membership

First up – how did we take this definition and make it useful? The mailing list server mentioned in my previous post was built using MailMan, the GNU Mailing List Server. This is a very flexible tool, though it is primarily focused on providing mailing list functionality in an internet setting – one where your identity is not managed in a corporate directory and one where mailing lists may have nothing in common beyond presence on a single server (no common membership or even commonality in, say, the email addresses of members on the list and no larger “identity” tied to those memberships).

So the first question was how to collect the list of members into an easily manipulated format and, in doing so, how to then connect the memberships with the larger identity of an employee?

MailMan does not provide a directly queryable data source that can easily be combined with data from other systems (basically, some type of SQL database whose data can be joined with others), so we took a two step approach to collecting membership lists:

  1. We added a minor customization to MailMan that provided the ability to get a list of members of a list in an XML format. Normally, this list is presented in HTML only. Each member was identified through an email address and (optionally) a full name.
  2. We built a simple sync mechanism that queried this XML interface and populated a SQL database with the list members for each mailing list.
    1. In this sync process, we used our corporate directory to match email addresses and were, through that connection, able to relate community members to their larger identify within the corporation.
    2. In addition, the sync process recorded the subscribing / unsubscribing “events” so that it was possible to understand not just who is a member at any point but how membership has changed.
  3. In addition to pulling mailing list membership into a SQL database, we also built a sync mechanism that populated a table with a record for every post to every mailing list for easy querying on those (I’ll explain more about why we did this in a future post). For this data, we connected the posts to the membership records (when possible – not all mailing lists are configured to require membership to post to them) and also stored the date / time of the post and the subject line. (I’ll write more about how this data was useful in a future post.)

Using the Membership Data

With this solution in place, it becomes a simple matter to answer basic questions about community membership. That being said, I’ll try to provide some actionable steps for each type of query that we could take based on the insights gained. Without keeping that focus, a lot of the possible analysis becomes academically interesting (perhaps) but does not have any meaningful business value.

Some examples:

  • You can easily query on membership for any given mailing list
    • This is useful for community managers to understand what topics are of primary interest within a community. It is often the case that a community will have more than one related mailing list, but which is of “most interest” (based on membership)?
  • Given a list of mailing lists associated with a particular community of practice, the community membership is easily queried as the set of distinct members across all related mailing lists
    • For community managers, this was useful to understand the effects of their efforts to increase membership.
  • You can track growth in communities by reviewing the number of people who subscribe / unsubscribe from mailing lists over a desired time period.
    • For community managers and community sponsors, this was useful insight to understand the history of the community.
    • You can see a sample chart that shows two communities and their growth over a series of quarters; one was a community that existed prior to the start of the CoP initiative (so it started out quite large) and shows good growth and then shows a decline (the big jump was likely due to some refactoring of community / mailing list alignments, ,though I don’t have the details) and the second community was one launched during this period and it shows a good, steady (what I would call “organic”) growth during the period covered here.
Community Size Chart

Community Size Chart

  • You can measure things like the percentage of the corporation that are members of any community or the percentage of members of specific groups within the company that are members of any community.
    • Looking at it from the perspective of percentage of the entire enterprise – this was useful insight because it provided the sponsors of the community of practice program with insight about how pervasive communities are throughout the enterprise.
    • This also provides useful insight to then contrast “penetration” with specific communities – it provides a baseline for comparison across time and within various slices of the organization.
    • As a baseline, we found that at the outset of the formal CoP program, about 28% of the corporation was a member of at least one community of practice. As we progressed forward, we could measure that penetration over time and, today, percentage is almost 38%. The chart presented below shows a series of quarters with this data displayed.
    • And, finally, given that baseline, it presents the possibility of understanding penetration into specific groups and allowing us to ask questions like, “X% of the community is a member of community A, but only Y% of group B is a member of community A – is this desirable?” In other words, specific groups could be targeted for recruitment if appropriate.
CoP Penetration Chart

CoP Penetration Chart

  • By turning the data around and looking at it from the perspective of the individual employees, we could answer a question like:
    • How many communities is an average employee a member of?
    • How many communities is an average community member a member of?
    • We were never able to specifically identify actionable steps to take from this insight but it gave us some idea of how widely interest between our communities ranged.

That’s an overview of the basic types of things we have been able to do with this data. Here are the topics I plan to cover in subsequent posts in this area:

  • Understanding the demographics of your communities
  • Using the “activity data” related to posts
  • Using this data in our performance management program
  • Measuring (part of) knowledge flow within a community

Check back for more on the above soon!

Community of Practice Metrics and Membership

Monday, November 10th, 2008

Most of my recent posts have been focused on enterprise search, so I thought I would take a break from that topic and write about another area of focus in the last several years – communities of practice (CoP) and the business problem of metrics associated with CoPs.

My current employer has had a CoP program for several years now (initiated under the direction of Ray Sims when he was Director of Knowledge Management here), though in recent years, investing in the formality of the program has dropped to almost nothing.  I believe the strategy is that we were successful in initiating the program, so the CoPs should be able to self-maintain now.  To be honest, I remain suspect of that, but it’s tangential to what I wanted to share here.

A Question of Membership

What I do want to provide here is some insight on some of the metrics we have provided to support this program and focus on one area in particular that might be of interest – membership in the communities.  To support the business’s need to understand the communities, a common set of questions we were asked were: “How big are these communities?”, “Are the communities growing?”, “Who are the members of community X?”, and so on.  Not necessarily really that pertinent to the success of the communities but those who were sponsoring and / or leading the communities wanted to know this type of information.

When we first launched the program, one of the activities we undertook was to understand the infrastructural needs of our communities and what technologies we had in place to support them.  The company had a number of communities that had grown up over the years, primarily around the creation of products and delivery of services related to those products, so we had a start on this effort already – several communities were already in place and simply needed to be acknowledged.

Due to the history of the corporation, we have a rather distributed employee population, especially in the services area, so much of the interaction among community members needs to be electronic.  Part of the infrastructure review was to understand what was already being used, but also to understand what the communities needed to succeed.

Among the many various tools we found people using within the company, a common thread was the use of an internal mailing list server for communication.  As we worked through the current state assessment of technology and also had to address the question of membership, we struck upon an answer that we found very useful:

The membership of a community would be defined by the membership in the related mailing list(s).

We generalized to multiple lists instead of defining the (narrower) approach of a single mailing list to a community because another track of the CoP initiative was to align our communities with the solutions in which we positioned our products.  So, for example, we might have a resource management community which encompasses a number of products and each of those products might have one (or more than one) mailing list associated with it.  We would include anyone who was a member of any of those mailing lists as a member of the resource management community.

This provided a means for communities to have a broad spectrum of interest groups and allowed for people to involve themselves in any one in which they might be interested.

Because this is a self-subscription mailing list server, it provides people with the ability to involve themselves in whatever community(ies) in which they have an interest.  There is no need for a gatekeeper / manager or in any way involve someone else to join.

Challenges with Membership through Mailing Lists

Of course, this approach was not without its issues.  These included:

  • It can (and was) argued that this approach leads to many people being considered “members” of a community of which they are not even aware.  This is due to the historical nature of the mailing lists (pre-dating the CoP initiative).
  • Simply because someone subscribes to a mailing list does not really imply that they want to take part in the community at large – they simply want to find a way to discuss “product X”.
  • On the technical end of things – it often happens that someone may subscribe to a list, but their subscription is not removed when they leave the company.  This leaves the potential for including members who are not “real”.

So, with the definition for community membership in place (albeit not perfect), what kinds of insights were we able to draw from this?  Look for my next post where I’ll start to delve into that area a bit more.

The future is search enabled applications, not enterprise search

Wednesday, November 5th, 2008

In an exchange in comments on Stephen Arnold’s blog, Stephen states the line that is the title of this post:

“the future is search enabled applications, not enterprise search”

I’m somewhat familiar with Stephen (I’ve seen him speak at a couple of conferences and also have followed his writing on his blog for some time), but I had actually not seen this declaration in the past (though Stephen says he’s accused of saying it too much).

In any event – I find this an interesting claim and I think I would agree with the sentiment but I also think that it depends on how you look at it.  As I wrote previously in trying to lay out what I thought enterprise search is, I think that the key aspects of an enterprise search are that it’s available to all members of the enterprise and that it covers all relevant content.

Down in the details, if access to the enterprise search is through embedding that it in numerous locations or one location, I do not believe it matters.  In fact, as I wrote previously, embedding access through multiple points is probably ideal – let workers access it within the environment in which they work, regardless of what tool(s) they normally use to do their job.

On the other hand, if the expectation is that you can embed search in single applications and expect that search only within that application is sufficient, I do not think that is now or will in the future be sufficient.  The information needs for any organization are diverse enough that no one application can realistically handle all of them – email, document management, CRM, support knowledge bases, intranets, policies, etc.


People Search – A Fourth Generation Proof of Concept – Part 2: The Design

Monday, November 3rd, 2008

In my last post, I described the goals I have tried to achieve with my proof of concept people search function. Here I will describe the design and implementation of this proof of concept.

Designing the Solution

Given the goals above, here’s the general outline of the design for this solution:

  • It would be built as a web application that generates a “profile page” for each worker – it is the set of all such profile pages that comprise the targets for a search engine to index.
  • Combined with a search engine (probably any search engine capable of indexing web pages would be sufficient – I used QuickFinder), it becomes trivial to integrate the search of these profiles into your enterprise search to provide a fourth generation solution to people search.
  • The core tenet of the data used is that I wanted to identify a set of activities for workers. The aggregation of keywords related to those activity is used to generate a profile for a worker.
  • An activity could potentially be anything that represents an event, action, writing, task, assignment, etc., that is associated with the worker.
  • Some examples of activities might include: edit of a wiki article, assignment of a task in an online workspace, posting of a message in a discussion form, membership in a project team, publishing a document in a corporate repository, posting an email to a mailing list, and so on.

Initially the web application directly queried the various systems used as sources when generating a profile for a worker. That is not scalable and also limits the amount of processing you can do, so I designed a simple SQL database to contain the data for this (implemented in MySQL). This database is essentially a data mart of worker data. The primary tables are:

  • worker (one row for each worker); this table contains the basic administrative data for a worker (it’s effectively a mirror of the organization’s corporate directory)
  • activity_source (each row describes a single source of activity which a worker might produce)
  • activity (one row for each individual “activity” associated with a worker); an activity must have a “description” – typically the title of an item or the subject of an email, etc.
  • From these tables, a few additional tables are generated by processing the data from the activity table
    • activity_keyword (contains a row for each keyword associated with an activity); a keyword is either any (individual) word from the description of the activity or a piece of metadata associated with the item (for systems which support such);
    • worker_top_keyword (aggregates the individual keywords associated with a worker [by association from activity_keyword through activity to the worker table]) so it’s easy to identify the top keywords for a worker without doing aggregation queries; each keyword in this table is weighted (see the description below of weights); I think of the set of keywords in this table for a worker to be that worker’s “attributes”
    • worker_connection (aggregates “linkage” between workers based on similarity of their keyword profiles); more on this later.

With the implementation of this database, I also implemented a synchronization tool that updates the data in the tables from the source systems for the various types of activities.

By automatically pulling data from these source systems (which workers use in their regular day-to-day work), you remove the need for the workers to maintain data.

  • By simply doing their job and “leaving traces” of that worker, they generate the data necessary for generating this profile. This achieves goal #2.
  • By restricting the set of data sources used to ones which anyone could examine for a worker’s activities (for example, I can view the history of a Wiki article and see who has edited it), I achieve goal #3.

Now, how should the profile page for a worker be presented?

Initially, I put together a design that did two things: 1) provided a typical employee directory style layout of my administrative details and 2) provided a list of all of the activities for a worker, grouped by activity source. In other words, you would see a list of all of the Wiki articles edited by the worker, a list of mailing list memberships, a list of community memberships, project team memberships, task assignments, etc. Each activity source’s list would be separately displayed (in a simple bulleted list). (Before this would go into production, I always have assumed I would ask for some design help from our electronic marketing group to give it a more professional look, but I thought the bulleted list worked perfectly well functionally.)

This proved simple and effective and also enabled the profile page to provide direct links to those activities that are addressable via a link (for example, the profile page could link directly to a Wiki article I’ve edited from my profile page, it could link to each discussion post, etc.)

However, this approach suffered from at least two problems: 1) it lacked an immediately obvious visual presentation of a worker’s attributes, and 2) it exposed every detailed activity of a worker to anyone who viewed the profile (I found when I demoed this to people, some had the immediate reaction of, “Wow – anyone can see all of these details? I’m not sure I like that!” – a reaction that surprised me given that any of the details are generally visible to anyone who wants to look, but go figure).

After looking for alternatives, I found that the keywords for a worker (when combined with their weights) provided good input for a tag cloud – which is what I ended up using as the default presentation of a worker’s keywords (visible to everyone). This helps to highlight what someone is “about”, presents a generally attractive visualization of the data, and, if the default view of a worker displays this tag cloud (and the worker’s administrative data) and does not show all of the details, it alleviates the concern mentioned above.

I have found the implementation of the tag cloud to be the trigger that pulls people into this tool – it helps satisfy my goal #5 because, for most people who have looked at this, it provides immediate validation when they see words they expect to see in their own tag cloud.

Here’s a shot of what part of my profile page looks like (partially obscured):

Lee Romero Profile

Lee Romero Profile

Additional Design Considerations

I wanted to keep the initial proof of concept simple in order to try to test different ways of using the data from the activity sources. With that in mind, here are some details on how I’ve done this so far:

  • When parsing the text associated with an activity into “keywords”, I took the simplest approach I could: the words from an activity are split into separate words when any non-alphanumeric is found. So a string like “content-management infrastructure” would result in 3 keywords: content, management and infrastructure.
  • I also removed any words that are stop words in our search engine.
  • Each keyword for a worker is assigned a weight. Simplistically, the weight of a keyword is the number of times that keyword shows up in that worker’s stream of activities.
  • However, the tool that maintains the keywords allows an administrator to assign a weight to each activity source – so some sources can be given an artificial boost just by assigning a weight for that activity source higher than 1. The only source whose weight I’ve really toyed with so far is the corporate directory itself – I have given that a weight of 20 instead of 1.
  • The weights for keywords are used in two ways:
    • The top 50 keywords (by weight) for a worker are used in the tag cloud for that worker. The weight is then used to size the words in the tag cloud.
    • When the “keywords” <meta> tag is being computed for a worker’s profile, the keywords are sorted by weight and the keywords are included until the length of the keywords content attribute is greater than 250 characters. This means that the top keywords are the ones which will give the worker higher relevance for searches on those words.
  • Because all workers will have, at absolute minimum, the same details in this profile as they would in the corporate directory, and because the keywords from that activity source are given extra weight, those keywords will almost certainly be in the “keywords” <meta> tag for their profile – this helps satisfy my goal #6 by ensuring good relevance when people search on worker’s administrative data (first name, last name, etc.)

Some additional functions I have layered on top of the basic profile / search mechanism that I believe will make this a valuable solution:

  • The keywords in the tag cloud are links to pages that provide details about that keyword. When a user clicks on a keyword in a tag cloud, they are presented with a tag cloud of keywords related to their starting keyword (related by way of people who have the keywords in common). In other words, it provides a set of keywords that have a lot in common with their starting keyword. The “keyword profile” page also provides a list of workers who use the selected keyword (the list is sorted by keyword weight).
  • When you view a worker, you are also presented with a list of workers who are “similar to” the worker you are looking at – the similarity measure is the percent of overlap of the current worker’s profile (weighted keywords) maps to the other workers. This provides a way to explore a neighborhood of similar people.
  • In addition to the list of similar worker, a link is provided for each worker which, when clicked, displays a page explaining why the two workers are similar.
  • Almost all of the data sources have a date threshold applied to the data pulled from the source – most of them take data from the last year. This ensures that the data used to build a profile is effectively self-maintaining.
  • Each worker has control over whether others can see all of the details (the individual activities) in their profile. By default, only the tag cloud and administrative data is visible. A worker can opt in to allow others to see their entire profile.

Issues / Future Directions

The proof of concept has been very interesting to work through and has presented me with some (subjective) proof of the value of this approach, as simple as it is. That being said, there are some issues and additional areas I hope are explored in the future:

  • This is a proof of concept built as basically a skunkworks project – I am hoping it will officially get some sponsorship and be launched into production.
  • I would like to see it integrated with additional data sources – currently, it uses 12 data sources but some high value sources that are not included would be our CRM system and our HR system. With the sources currently in use, it tends to skew the people whose profiles look sufficiently detailed to be ones who use the sources. Integrating these is relatively easy – a single SQL query from the source system that provides a list of activities for workers (where the source system can define whatever it wants to represent activities) is all that’s needed. It is this ease of adding in sources that achieves my goal #4.
  • I believe there is still a lot of work to do around tweaking the weights of activity sources to balance out the effects of various sources.
  • I would like to see some exploration of workers directly tagging other workers (to add keywords) or possibly allowing workers to give a thumbs up / thumbs down to individual keywords in a profile for a worker. This would add a powerful way for people to influence their own and others’ profiles.
  • This approach also needs to receive more testing from others to validate its effectiveness. I have had a few dozen people look at it and provide feedback but some more quantitative approach to this would be valuable.
  • I think this profile for a worker could be presented in a FOAF format as well – I’m not sure if that provides additional value, but it is a path to explore.
  • The algorithm for parsing out keywords from the activities could be improved beyond the very simplistic parsing applied now.
  • And, finally, I think that the measurement of similarity between workers could be significantly improved and the data from the links between workers embedded in this could be used to do some research to find “invisible communities” within the company. This would be a kind of organizational network analysis through data mining, which