In this post, I’ll go into how we have used some of the activity data we collect from mailing lists. In my mind, this is more valuable than the basic membership data because it represents someone looking for or sharing knowledge with their fellow community members and, from my experience, gets closer (but still is not quite “there”) to the idea of an engaged / core team member.
As I mentioned in my post about how we collect the membership data, when we implemented the mechanism that enables us to get list members into a queryable data source, we also implemented a means to populate a queryable data source with data about individual posts to the mailing lists. The data model for this data source is pretty simple, but provides us with a lot of ability to analyze the data:
I provide the above details in order to provide some insight about the types of insights we can gain and, if you can think of types of analyses not included below, feel free to comment!
Similar to the general question that drove our initial definition of a community member, we were asked by our CoP program sponsor and CoP leaders to come up with a definition of active member and to be able to provide some insights on active members. Given our definition of community member, our initial (admittedly, very, very myopic) definition of active community member is:
An active community member for a given time period is one that has posted at least one message to a mailing list associated with the community.
As I said, we know this is myopic, but it has some advantages: 1) it’s a clear, objective definition that requires no subjective assessment; and, 2) it’s easy to gather the data to support analysis of this with the tools we have.
Note that with this definition, it’s important to recognize that the concept of active member does have time sensitivity. Because someone was considered “active” a month ago, does not mean they are considered active this month. Because of this, we had to also make some decisions about a timeframe in which we considered someone active and generally looked at such things quarter by quarter.
Note also that there are no “shades of grey” with regard to being active – a member either is or is not active. We did not introduce a concept of some minimum number of posts that were required to be considered active. This would be something to consider, but the challenge is that the population and traffic in different communities varies so widely that it would be a challenge to come up with a defensible definition of “how many posts is enough” for any given community.
With that definition, we could then provide all of the same types of insights I’ve previously described for general membership, including:
The accompanying chart here provides an example of the visualization we can apply to this data – in this case, it shows a few sample communities over a stretch of several quarters. Some of the insights to be gained from this (and why it shows so many quarters) include understanding how communities change over time. In the chart you will notice that Community 4 starts off relatively high and shows a steady decline. This represents a case where the community is likely on the decline from its usefulness and we can consider retiring it (or, at least, discontinuing any kind of formal support for it). By contrast, Community 3, while overall it shows many fewer active members, shows a period of growth (it launched just at the start of the period covered here) and then stabilizes. In absolute terms, it may not be large or active enough to warrant formal support but it can be intuited that it is as useful as it has been in the past.
Another way to view this data is to consider total posts instead of just distinct active members. In most communities, these numbers will likely trend with each other pretty closely but sometimes a community may have particular members that post a much higher percentage of total posts than others. The accompanying chart shows an example of a visualization of this type of data. In this case, Community 5 corresponds to Community 4 in the above chart and you can, again, see another representation of the decline in that community. As with active members, it’s another way to help focus (limited) resources on communities based on some objective measures.
You can trend this data over arbitrary time periods and also (if you wish, though we have not) break out the activity by the same types of demographics discussed previously.
Another interesting perspective is to be able to find the most active members within a community. This is easy to do given our approach to what “activity” is. This insight can be used to help identify potential core team members for the community or even potential community leaders. It can also be used to provide some type of community-based recognition for contributors.
At this point, I have to add the disclaimer that not all posts are created equally – using the simple, raw data can be misleading. A member who posts often but (relatively) useless content is not likely to be recognized as a key contributor who might only post a few times in any given time period but whose posts are valuable, insightful, thoughtful contributions to the community’s knowledge base. So it’s important that this raw data be taken as only part of the overall picture.
Another obvious use of this data is to be able to understand the population of “lurkers” as a percentage of overall membership. This is another one of those types of measures that I have been challenged to determine a specific actionable steps to take with, other than to understand and contrast how individual communities compare against the baseline of the overall CoP program and against themselves over time. I’m not sure what actions might be taken with the insight gained, however.
As an interesting aside, given our definition of “active”, the percentage of lurkers we see across our community program (which is the total-active-members-across-all-communities divided by total-community-members-across-all-communities) remains very stable at just about 50% over any 6 month period we use for measuring.
Some of the more interesting types of analysis I’ve performed to try to understand the connectivity of community members is to look at the overlap of active members between communities (what I think of as connectivity between communities) and also connectivity within communities to try to understand flow of information.
In terms of connectivity across communities, the approach I took was similar to the discussion about looking at the Demographics by Community in my previous post – though restricting it to only active members for each community.
Within a community, I have looked a bit deeper and used the threading of the messages to connect people based on actually taking part in the same conversations. I also have used some network visualization tools to try to visualize the resulting data set and ended up with the following as an example. This diagram shows the active members of the community as nodes and two nodes are linked if they communicated with each other on at least one thread. The size of a node represents how active that member is (count of posts), and the color of the node represents the member’s organization. This particular image does not show it, but I also tried displaying the weight of the link between members to reflect how often the two members communicate.
The net of this was that it presented really another way to understand who are the primary connectors / contributors within the context of the community (or, more accurately, its mailing lists) which can be useful to identify potential core team members or community leaders. At the end of the day, it quantitatively did not provide a whole lot more than simply identifying most active members as they tend to be the same people connecting people together regardless. I have not at this point done any deep network analysis on this type of data but that could be done, obviously, and that would likely provide much more insight about the structure of the network this data presents.
One interesting use of this type of visualization, though, is just as a communication tool with people about the community program. It is very impactful to be able to show someone this type of visualization and be able to talk about how the communities provide a very fertile ground for exchange of information, innovation and connectivity and to be able to show in such a diagram that this connectivity straddles across organizational boundaries is a valuable tool.
I previously shared our general strategy with regard to answering the question about who is a member of a community of practice and, given our answer, how we actually implemented a solution to support our understanding of community membership. In this post, I’ll be providing some more insights and ideas around how we’ve been able to use this data and how it has helped shape our community strategies. Hopefully at least the latter might be of use to you.
By defining community members as we have and, further, by taking care to connect those members to their larger identity in our corporate directory, one of the areas we found we could delve more deeply into was the demographics of community members: what organizations made up various communities and what geographies were represented.
By marrying membership to the corporate directory identities, we could link to cost centers of the members and, through those, to the larger organizational units and geographies containing the community members. The following shows a sample of the type of data we could review (I’ve generalized the labels to obscure some of the details here – I’m simply trying to illustrate the ideas here):
This first chart shows the progression through quarters of the percentage of the overall community population from each of the major geographies in which we operate. Similar to the ability to view the percentage of the company overall that was a member of a community, this analysis leads us to being able to understand who makes up the communities and, more specifically, allows us to target certain geographies if they are believed to be under-represented.
A few additional drill-ins we could support with this view was the ability to see this same demographic data for each individual CoP – so specific communities could find where they might need to target some education about the communities.
Because we could get total counts for geography headcount, we could normalize this data – so we could compute the percentage of each geography within the overall community population or within each individual community. This allows better comparison across geographies, not just across time within one geography and helps to focus even more on geographies that might need some attention.
The following shows a table with data showing a breakdown by functions over time. In this case, I’ve kept it tabular to show that this provides a multi-level breakdown, though, again, I have generalized labels.
This data does not support anything distinctly different in terms of actions that the demographics by geography does – it’s simply another way to slice the data to understand community members, both overall and by community.
Another approach we could use with this data was to ask the question – “How alike are the communities?” In other words, what is the demographic slicing of community when the slice is by community? This analysis starts to get into the “academic” mode as I was not sure at the time (and still am not) if there is anything actionable that can be done with this insight, but it is interesting to understand the overlap between communities. The following is a table that shows the slicing of 11 communities along the dimension of each of those communities.
Reading across a row of the table, the value shown is the percent of the community in that row that overlaps with the community in that column. The diagonal going from upper-left to lower-right is obviously 100% as each community exactly overlaps itself. The report also color-codes the table so that particularly high or low values show a bit more obviously.
In hindsight, I realized that this is really doing something like a network diagram of the community program, where the nodes are the communities and the weight of the links are the percent overlap. I have not used that visualization with this data but it’d be an interesting way to understand your community program.
A last example of a use for this data was effectively an inverse of the demographics by community – trying to understand how widely spread out the community members were themselves. In other words – how many people are members of exactly one community, how many are members of exactly 2 communities, etc. As with the demographics by community, I do not believe we ever found anything actionable from this insight, but it proved an interesting way to understand how widely people’s interests ranged across communities. The following table shows the data across two quarters for this view of community members. For each value along the X axis, it shows the # of people who were a member of exactly that many communities in each of the two quarters.
Given the relatively large jump between quarters of members who were a member of exactly one community, my guess (without diving into the details) would be that this likely represents a targeted campaign in a new community to gather members among a group who previously had not been involved in a community. While we never performed it, it would be interesting to correlate how / if people would tend to “spread out” among communities over time – perhaps joining one and, finding value in their participation, they join additional ones (in other words, people may migrate a bit to the right in this diagram over time).
In my last post, I described an effort to define “community membership” and how we came upon the definition:
The membership of a community would be defined by the membership in the related mailing list(s).
In this post, I will provide some of the work we did to effect this definition and also some of the basic analysis and insights this supported.
First up – how did we take this definition and make it useful? The mailing list server mentioned in my previous post was built using MailMan, the GNU Mailing List Server. This is a very flexible tool, though it is primarily focused on providing mailing list functionality in an internet setting – one where your identity is not managed in a corporate directory and one where mailing lists may have nothing in common beyond presence on a single server (no common membership or even commonality in, say, the email addresses of members on the list and no larger “identity” tied to those memberships).
So the first question was how to collect the list of members into an easily manipulated format and, in doing so, how to then connect the memberships with the larger identity of an employee?
MailMan does not provide a directly queryable data source that can easily be combined with data from other systems (basically, some type of SQL database whose data can be joined with others), so we took a two step approach to collecting membership lists:
With this solution in place, it becomes a simple matter to answer basic questions about community membership. That being said, I’ll try to provide some actionable steps for each type of query that we could take based on the insights gained. Without keeping that focus, a lot of the possible analysis becomes academically interesting (perhaps) but does not have any meaningful business value.
That’s an overview of the basic types of things we have been able to do with this data. Here are the topics I plan to cover in subsequent posts in this area:
Check back for more on the above soon!
Most of my recent posts have been focused on enterprise search, so I thought I would take a break from that topic and write about another area of focus in the last several years – communities of practice (CoP) and the business problem of metrics associated with CoPs.
My current employer has had a CoP program for several years now (initiated under the direction of Ray Sims when he was Director of Knowledge Management here), though in recent years, investing in the formality of the program has dropped to almost nothing. I believe the strategy is that we were successful in initiating the program, so the CoPs should be able to self-maintain now. To be honest, I remain suspect of that, but it’s tangential to what I wanted to share here.
What I do want to provide here is some insight on some of the metrics we have provided to support this program and focus on one area in particular that might be of interest – membership in the communities. To support the business’s need to understand the communities, a common set of questions we were asked were: “How big are these communities?”, “Are the communities growing?”, “Who are the members of community X?”, and so on. Not necessarily really that pertinent to the success of the communities but those who were sponsoring and / or leading the communities wanted to know this type of information.
When we first launched the program, one of the activities we undertook was to understand the infrastructural needs of our communities and what technologies we had in place to support them. The company had a number of communities that had grown up over the years, primarily around the creation of products and delivery of services related to those products, so we had a start on this effort already – several communities were already in place and simply needed to be acknowledged.
Due to the history of the corporation, we have a rather distributed employee population, especially in the services area, so much of the interaction among community members needs to be electronic. Part of the infrastructure review was to understand what was already being used, but also to understand what the communities needed to succeed.
Among the many various tools we found people using within the company, a common thread was the use of an internal mailing list server for communication. As we worked through the current state assessment of technology and also had to address the question of membership, we struck upon an answer that we found very useful:
The membership of a community would be defined by the membership in the related mailing list(s).
We generalized to multiple lists instead of defining the (narrower) approach of a single mailing list to a community because another track of the CoP initiative was to align our communities with the solutions in which we positioned our products. So, for example, we might have a resource management community which encompasses a number of products and each of those products might have one (or more than one) mailing list associated with it. We would include anyone who was a member of any of those mailing lists as a member of the resource management community.
This provided a means for communities to have a broad spectrum of interest groups and allowed for people to involve themselves in any one in which they might be interested.
Because this is a self-subscription mailing list server, it provides people with the ability to involve themselves in whatever community(ies) in which they have an interest. There is no need for a gatekeeper / manager or in any way involve someone else to join.
Of course, this approach was not without its issues. These included:
So, with the definition for community membership in place (albeit not perfect), what kinds of insights were we able to draw from this? Look for my next post where I’ll start to delve into that area a bit more.