In this post, I’ll go into how we have used some of the activity data we collect from mailing lists. In my mind, this is more valuable than the basic membership data because it represents someone looking for or sharing knowledge with their fellow community members and, from my experience, gets closer (but still is not quite “there”) to the idea of an engaged / core team member.
Basic Activity Data
As I mentioned in my post about how we collect the membership data, when we implemented the mechanism that enables us to get list members into a queryable data source, we also implemented a means to populate a queryable data source with data about individual posts to the mailing lists. The data model for this data source is pretty simple, but provides us with a lot of ability to analyze the data:
- The basic community membership data is in three main tables: community, mailing_list and mailing_list_member. These are populated and maintained by the membership collection process.
- In addition, we have a table for mailing_list_event that contains a record for each event that happens to a mailing list. The prime event that we’re interested in here is a “post”.
- Each mailing_list_event includes the following data: the event type (”POST” in this case), the date/time of the event, the list the event is associated with, identification of the actor in the event, and, in the case of a “POST” event, the subject line of the post.
- The “identification of the actor” for posts is slightly complicated in that it is not strictly necessary that the actor be a member of the list (so we can not tie the posts to members) and it might, conceivably, be a post from an email address that is not recognized in our corporate directory. The latter is rare, but it means we need to identify posts by the email address and, when possible (which is in 99%+ of the time) we also capture the unique employee identifier for the employee who posted it.
I provide the above details in order to provide some insight about the types of insights we can gain and, if you can think of types of analyses not included below, feel free to comment!
Similar to the general question that drove our initial definition of a community member, we were asked by our CoP program sponsor and CoP leaders to come up with a definition of active member and to be able to provide some insights on active members. Given our definition of community member, our initial (admittedly, very, very myopic) definition of active community member is:
An active community member for a given time period is one that has posted at least one message to a mailing list associated with the community.
As I said, we know this is myopic, but it has some advantages: 1) it’s a clear, objective definition that requires no subjective assessment; and, 2) it’s easy to gather the data to support analysis of this with the tools we have.
Note that with this definition, it’s important to recognize that the concept of active member does have time sensitivity. Because someone was considered “active” a month ago, does not mean they are considered active this month. Because of this, we had to also make some decisions about a timeframe in which we considered someone active and generally looked at such things quarter by quarter.
Note also that there are no “shades of grey” with regard to being active – a member either is or is not active. We did not introduce a concept of some minimum number of posts that were required to be considered active. This would be something to consider, but the challenge is that the population and traffic in different communities varies so widely that it would be a challenge to come up with a defensible definition of “how many posts is enough” for any given community.
With that definition, we could then provide all of the same types of insights I’ve previously described for general membership, including:
- Count of active members of the CoP program
- Count of active members within each individual community
- Demographics of active members (by geography, function, etc., and, again, we can look at the CoP program overall or on a community-by-community basis)
The accompanying chart here provides an example of the visualization we can apply to this data – in this case, it shows a few sample communities over a stretch of several quarters. Some of the insights to be gained from this (and why it shows so many quarters) include understanding how communities change over time. In the chart you will notice that Community 4 starts off relatively high and shows a steady decline. This represents a case where the community is likely on the decline from its usefulness and we can consider retiring it (or, at least, discontinuing any kind of formal support for it). By contrast, Community 3, while overall it shows many fewer active members, shows a period of growth (it launched just at the start of the period covered here) and then stabilizes. In absolute terms, it may not be large or active enough to warrant formal support but it can be intuited that it is as useful as it has been in the past.
Activity Over Time
Another way to view this data is to consider total posts instead of just distinct active members. In most communities, these numbers will likely trend with each other pretty closely but sometimes a community may have particular members that post a much higher percentage of total posts than others. The accompanying chart shows an example of a visualization of this type of data. In this case, Community 5 corresponds to Community 4 in the above chart and you can, again, see another representation of the decline in that community. As with active members, it’s another way to help focus (limited) resources on communities based on some objective measures.
You can trend this data over arbitrary time periods and also (if you wish, though we have not) break out the activity by the same types of demographics discussed previously.
Most Active Members
Another interesting perspective is to be able to find the most active members within a community. This is easy to do given our approach to what “activity” is. This insight can be used to help identify potential core team members for the community or even potential community leaders. It can also be used to provide some type of community-based recognition for contributors.
At this point, I have to add the disclaimer that not all posts are created equally – using the simple, raw data can be misleading. A member who posts often but (relatively) useless content is not likely to be recognized as a key contributor who might only post a few times in any given time period but whose posts are valuable, insightful, thoughtful contributions to the community’s knowledge base. So it’s important that this raw data be taken as only part of the overall picture.
Identification of Lurkers and Percentage of Lurkers
Another obvious use of this data is to be able to understand the population of “lurkers” as a percentage of overall membership. This is another one of those types of measures that I have been challenged to determine a specific actionable steps to take with, other than to understand and contrast how individual communities compare against the baseline of the overall CoP program and against themselves over time. I’m not sure what actions might be taken with the insight gained, however.
As an interesting aside, given our definition of “active”, the percentage of lurkers we see across our community program (which is the total-active-members-across-all-communities divided by total-community-members-across-all-communities) remains very stable at just about 50% over any 6 month period we use for measuring.
Connectivity within and between communities
Some of the more interesting types of analysis I’ve performed to try to understand the connectivity of community members is to look at the overlap of active members between communities (what I think of as connectivity between communities) and also connectivity within communities to try to understand flow of information.
In terms of connectivity across communities, the approach I took was similar to the discussion about looking at the Demographics by Community in my previous post – though restricting it to only active members for each community.
Within a community, I have looked a bit deeper and used the threading of the messages to connect people based on actually taking part in the same conversations. I also have used some network visualization tools to try to visualize the resulting data set and ended up with the following as an example. This diagram shows the active members of the community as nodes and two nodes are linked if they communicated with each other on at least one thread. The size of a node represents how active that member is (count of posts), and the color of the node represents the member’s organization. This particular image does not show it, but I also tried displaying the weight of the link between members to reflect how often the two members communicate.
The net of this was that it presented really another way to understand who are the primary connectors / contributors within the context of the community (or, more accurately, its mailing lists) which can be useful to identify potential core team members or community leaders. At the end of the day, it quantitatively did not provide a whole lot more than simply identifying most active members as they tend to be the same people connecting people together regardless. I have not at this point done any deep network analysis on this type of data but that could be done, obviously, and that would likely provide much more insight about the structure of the network this data presents.
One interesting use of this type of visualization, though, is just as a communication tool with people about the community program. It is very impactful to be able to show someone this type of visualization and be able to talk about how the communities provide a very fertile ground for exchange of information, innovation and connectivity and to be able to show in such a diagram that this connectivity straddles across organizational boundaries is a valuable tool.