Skip to content

How many blogs are there in the Malaysian SoPo blogosphere?

This is an interesting paper about the Malaysian SoPo blogosphere – it uses social network analysis (SNA) techniques to crawl links and map the Malaysian SoPo blogs.

It doesn’t seem to say when the data was collected, but the paper was published in 2010, and the crawl was done after the 2008 elections, so let’s assume 2009. One worry I have is that they used the SoPo Sentral of Malaysia directory as the starting point of their crawl – but that was last updated in December 2008, and is not necessarily complete. On the other hand, it’s probably the best place to start. But Dr. Mahathir’s blog (which started in May 2008) is not mentioned which is surprising.

What they did was to take the 385 blogs in the SoPo-Sentral directory and follow the link to a depth of four (e.g. follow links from one blog, collect all the links it links to, then follow those, and again two times). From this they got 4,693 sites, and approximately 2,000 blogs. Another crawl using the same technique on blog posts mentioning ‘Bersih’ in the week after the Bersih demonstrations gathered 878 blogs.

So this suggests that there were about 2,000 Malaysian SoPo blogs. Which is less than I imagined. Edit: Actually, it's more likely to mean that there are between 878 and 2000 sopo blogs.

Here are some of the results. They also compare SoPo blogs to random Malaysian blogs.
• The almost paradigmatic polarisation of SoPo blogs demonstrated in the USA by Adamic & Glance is not replicated here. Instead there is a distinct clustering of smaller groups with scattered individuals forming ‘bridges’.
Source details below

My hunch here is that this reflects patronage-style politics – people affiliating themselves with individuals – rather than identifying themselves with ideology or political stances (to note however that another SNA analysis of the Malaysian blogosphere did turn up a polarisation – though it was not focused on SoPo blogs).
• SoPo bloggers are more likely (compared to random Malaysian bloggers) to be older males (this is the same as the US), and also to reveal details about themselves.
• 27.9% specify a political affiliation as follows: 10.2% UMNO, 7.1% PAS, 4.2% PKR, 2.5% DAP, 1.2% PSM. Surprisingly no other BN parties such as MCA, MIC, Gerakan, etc. Overall, it suggests that most SoPo bloggers see themselves as independents.
• SoPo blogs are four times more likely to be in English than BM (“the small amount of Chinese blog content in [their] data” were not dealt with).
• “the average sopo blogger has more in-links and comments than random Malaysian bloggers”
• Blogger.com (i.e. ‘blogspot’) “has more than 152,000 Malaysian profiles — many more than on Wordpress.com or similar services.”

You can access the paper (in a somewhat garbled copy) here on Scribd, and the full details are:
ULICNY, B., M. KOKAR & C. MATHEUS 2010. “Metrics For Monitoring A Social-Political Blogosphere: A Malaysian Case Study”. Internet Computing, IEEE 14, 34-44.

Social Network Analysis of the Social Media Club - Kuala Lumpur

SMCKL is a group that meets occasionally to explore matters relevant to social media and industry. The most recent one was about social media monitoring tools, and featured three presentations by comScore, Brandtology and JamiQ. They were interesting, but I was surprised that nobody was talking about social network analysis - so I thought I'd do a little demonstration here.

There was much tweeting going on before and after the evening, which was also an occasion for people to meet and network. Using NodeXL, I gathered all the tweets with the hashtag #smckl: in all there were 71 tweeters, and 757 'edges' (i.e. links in the form of 'Followed' relationships, 'Mentions', or 'Replies to'). The following examples only take into account the Followed relationship - i.e. I am only showing a link between tweeters when one follows the other.

A question for social media monitoring has to be: how influential is any particular tweeter? Here I'll look at two ways of visualising that.

Followers
A common measure is how many followers a tweeter has.
nodexl social network analysis sna visualisation twitter social media malaysia

In these images, the size of the profile picture is proportionate to the number of followers - the bigger the profile picture, the more followers. Also, the more central the tweeter is, the more ties s/he has with the other tweeters. The person in the middle is the most embedded in the network - with the most ties to other people, directly or indirectly; on the other hand, as you can see, there are some really on the edge - with only a couple of lines attached them to the denser cluster in the middle. They are outliers, less likely to be influential within this group.

The first picture was very dense, so I have filtered out all tweeters with less than 500 followers
nodexl social network analysis sna visualisation twitter social media malaysia

and with less than 1000 followers.
nodexl social network analysis sna visualisation twitter social media malaysia

Again, a pattern emerges of a denser cluster in the middle with a few outliers. What this suggests is that most people at the SMCKL evening already know each other. But not all: I said above that outliers are less likely to be influential within that group - it's important to note here that the person with the most followers (@victorliew) is an 'outlier'. This suggests that he could be an important 'bridge' for this group to connect to another group. The question would be - who is he? And why are so many people following him?
Continue reading "Social Network Analysis of the Social Media Club - Kuala Lumpur"

How can 10,000 unique visitors mean an audience of 100?

A distinct advantage of internet advertising is the ability to accurately measure the audience (through page views), and to know precisely how many people took an interest in the ad by clicking on it. 'Click fraud' (simulating different people by repeated clicking) is detected by automated software, and 'unique visitors' (based on the IP addresses) deals with the problem of the same person refreshing a page in order to simulate a different person.

This is how Google has made billions of dollars, so it must be pretty reliable overall.

However, how can 10,000 unique visitors equal an audience of 100? To answer this, we have to consider the network within which the ad is displayed. For this example, let's imagine a random blog advertising network - called 'BlogAdNet': BlogAdNet works by registering thousands of blogs, all of whom allocate space on their blog for advertisements to be automatically displayed as and when BlogAdNet wants to. They then go to potential clients and say, for example, 'Our network of blogs receives 10,000 unique visitors a day'; but this does not necessarily mean 10,000 different people. Imagine a very dense network of 100 bloggers, all of whom visit each other's blog every day - each blogger reads 99 other blogs every day. 99 x 100 = 9,900. So, the 10,000 unique visitors could in fact be 100 people, plus one other person (imagine BlogAdNet doing regular monitoring) visiting all the blogs.

I've used NodeXL (a useful social network analysis (SNA) tool that integrates with Excel), to think about a few examples that demonstrate how SNA can give more insight into the behavioural aspects of blog readers. Represnted in an SNA graph, the dense network of 100 readers would look like this (except that I've scaled it down to ten users to be easier to see):
social networks analysis sna blogs malaysia

Everyone is connected to everyone else, and nobody is more 'influential' than others.

However, this would be very unusual. Most networks are clustered - using the above ten blogs, I've chosen A, B and C as the 'top bloggers': everyone visits them, and they always visit each other (but don't visit the other). DEF always visit ABC, and each other. GHI are a similarly clustered sub-group. And J, who is visited by nobody (aww) always visits ABC (like everyone else), and also D, F, G and I.

Now, the same network, based on the same calculations, looks like this:
social networks analysis sna blogs malaysia

The size of the nodes are based on the 'in-degree' - i.e. the number of incoming visitors. So A, B and C are the biggest, and J the smallest.

You can also calculate 'Betweenness'. In a network, it's not only the direct connections that matter - someone 'between' you and another person may be relaying your thoughts, or enhancing your reputation.
social networks analysis sna blogs malaysia

So, the node J is now bigger than the other two sub-groups DEF, and GHI. So, in theory, J could be seeing something on A's blog, and then telling others about it; or starting conversations in their comments section and acting as a 'bridge' between sub- groups DEF and GHI. Or maybe J is just a lurker, who never says anything? The only way to find out would be to go and look at what J does. This points to one of the limitations of SNA - you can detect the presence of a link, but you don't always know what it means in practice.

The Eigenvector Centrality calculation combines the above, looking at the number of connections each blog has, and the degree of the blogs it connects to:
social networks analysis sna blogs malaysia

E and H are now smaller, because they have less overall connections. J remains apparently influential, but the lack of incoming links is not reflected here.

OK, I've got to stop this, and get on with writing my thesis!! :-|

Some conclusions

The density of a blogger network tends to depend on a few factors such as: geographical location, shared cultural features, blog genre, gender, and interest. For example, Malaysian bloggers/readers are more likely to read other Malaysian blogs; or female bloggers/readers interested in fashion and makeup will read blogs that focus on that. The density will be increased when they go to events together, when they link to each other, and so on.

If you want to measure influence on the internet, relying on classic data that is based on non-contextualised quantities is not enough. For example, if you say ‘There are 5,000 mentions of new product X since we launched the campaign’; this does not tell you the relative importance of each mention. You can combine that with unique visitors: ‘5,000 mentions of which 200 were on blogs that receive more than 2,000 daily unique visitors’. But still, what if all those 2,000 visitors are part of a densely clustered network who mostly read each other’s blogs?

The subjective and 'thick' understanding of the contextual meaning of links still needs human eyes. But they can be helped by automated processes that, for example, detect key words, emotional content, etc.

What do you think? How important can SNA be in elucidating these more subjective social aspects of online interaction?

I’m still learning about SNA, and don’t know much about what happens in social media monitoring companies, so if anyone has any corrections or advice, please use the comments section below. Thanks! :-)

Visualising a monetised Twitter network

This is just a little experiment with nodeXL, inspired by this example of using it to visualise a Twitter network. NodeXL is a very nice social network analysis (SNA) and visualisation tool. It works from Microsoft Excel, and is very light and easy to use. The NodeXL Tutorial provides instructions on how to use it.

One thing that's particularly nice, for an SNA neophyte like myself, is that nodeXL can both search the net and do the visualisation (you can do this on VOSON too, though). And you can search Twitter too.

Many people on the Malaysian twitterverse will have noticed #xpaxblackberry coming up fairly often recently, and it seems clear that Xpax had purchased the help, perhaps via ChurpChurp, of various key bloggers/tweeters to get the word out. In addition, Xpax was organising an event last Saturday (which I was able to go to, after entering a competition with Nuffnang) to launch their new prepaid Blackberry service.

So - I decided to see what would happen if I put the search term - "xpaxblackberry" into NodeXL.

This is what I got on the 8th October - two days before the launch party
social network analysis visualisation nodexl twitter monetisation

This represents the tweeters who mentioned 'xpaxblackberry' in their tweet, and the lines represent who follows whom, within that group.

The size of the picture is relative to the "Betweenness centrality" of the tweeter: i.e. some people are more connected to other people, either directly or via other people, so they are 'in between' more people. For example: if I know Joe, Peter, and Jane, but none of them know each other, then I have a greater 'betweenness' value.

So, in the above graph, we can see that the four tweeters with the greatest centrality are @kennysia (BC value = 1), @benjern (BC value = 0.876), @julesisapen (BC value = 0.703), @joycethefairy (BC value = 0.671).

I also ran a 'Cluster' calculation, which calculates "the number of edges connecting a vertex's neighbors divided by the total number of possible edges between the vertex's neighbors." (Hansen, Shneiderman & Smith, p16). Basically, it tries to spot the clusters of nodes that are more interconnected amongst each other than to other people. They are represented by represented by the different colours, which can be seen easier here - four major clusters are visible.
social network analysis visualisation nodexl twitter monetisation

The next time I ran it was on the 10th October, in the afternoon before the event.
social network analysis visualisation nodexl twitter monetisation

The top four this time are: @benjern (BC value = 1), @julesisapen (BC value = 0.834), @kennysia (BC value = 0.685), @spinzer (BC value = 0.357).


The third time was on the 15th October, the Thursday following the event.
social network analysis visualisation nodexl twitter monetisation

The top four this time are: benjern (BC value = 1), @julesisapen (BC value = 0.625), @xpaxsays (BC value = 0.432), and @joycethefairy and @MyXpaX are equal in fourth place (BC value = 0.398).

• There are clearly more people, but not many more clusters here.
• Two new tweeters are prominent, @xpaxsays and @MyXpaX - they are 'corporate tweeters'.
• One interesting point is that although @joycethefairy has 1,521 followers, and @MyXpaX has only 19 followers, they have the same degree of centrality in this particular snapshot of the twitterverse. This shows how much the sample can influence the result of the 'social network' being analysed: within this sample thirteen followers of @joycethefairy and @MyXpaX tweeted 'xpaxblackberry', meaning they have the same weight in this sample. What has happened is that @MyXpaX keeps retweeting/mentioning and following tweeters who mention 'xpaxblackberry'.
• @kennysia, who was initially the most prominent and central person, has disappeared right off the graph. This must be because the archives are only kept for so long, and he has not tweeted recently enough; or that the tweets have gone beyond the 10-page limit (discussed here, I'm not sure what the exact story is). Or nodeXL only limits itself to a certain amount of days.

Conclusions
• To do an experiment like this better one would have to analyse more carefully over time (e.g. doing a search every hour or something - for a more sophisticated example see Tim Highfield's foray).
• What's interesting is to note the shifting of the centre of this particular 'conversation'.
• To get an idea of the relative importance of the tweeters, or at least assumed importance, it would be necessary to include some computation of the number of followers each one has.
• The reciprocity of follower/following is important too. The more followers there are compared to following, the more significant that tweeter is likely to be.
• The connections between tweeters are generally quite dense - that is to say, although there is clustering of smaller groups, there are lots of ties between the groups too.
• Overall, the leading tweeters are also leading bloggers. For the moment, I would say that there's no clear differentiation between the Malaysian blogosphere and twitterverse.

What is a Twitter network?

As I have learnt to use Twitter better, I have understood that one of the key things to keeping it useful is to follow the right person - for example, I like to follow various Malaysian politicians (e.g. @limkitsiang, @Khairykj, or @elizabethwong, amongst others) because when important political things are happening, they will be tweeting about it.

I installed TweetDeck recently, mainly because I found out it has a way of grouping the tweeters you follow - so now I have three groups: 'All Friends' (the default group), 'Academics', 'SoPo', and Facebook updates. Another thing that encouraged me to get TweetDeck was the realisation that much chatter amongst the academics probably happens while I'm asleep, due to timezones.

Anyway, this post is the second (the first was Social networks and commenting) that was sparked off by Would the real social network please stand up?

Thinking about Twitter, I agree with a commenter on the 'real social network...?' post (Adrian Chan) that a list of the people one follows would be more of a behavioural network: e.g. the groups I mentioned above are people I share interests with, but may never meet.

A Twitter network is also 'publicly articulated' in the sense that it's consciously expressed (i.e. through choosing people to follow) and people can see who you are following; in addition one can retweet in a name-dropping fashion, and people organising to meet up may display all the others who are in on the conversation. If you were to ask a tweeter who her 'personal network' was it would probably include some of those she follows, but her personal network would include many who are not also tweeters. If one were to trace all the 'followees' (i.e. those who are being followed) a group of people have, one could probably infer as to personal and behavioural networks - the personal would probably be denser. A 'Twitter network' is apparently multivalent, and seems to support the argument that networks depend what you're trying to measure, and how you go about doing it.
Facebook social network analysis visualisation
A Facebook publicly articulated network
Bernie Hogan


Some recent discussion amongst Malaysian bloggers about a soon-to-be-launched Malaysian Twitter monetisation service, Churp Churp (it is run by Nuffnang) makes me wonder about how their responses could relate to the different types of social networks.

The discussions has tended to centre around the inherent property of tweeting, that the tweet comes to you directly, whether you want to see it or not - as opposed to a blog, where you can choose to not read an advertorial (in fact, this is rather 'old media' in a way - like television; which must make it attractive to advertisers). So, Colin Charles (aka @bytebot) recommends that tweeters do not use the service, asking "do you want to alienate your followers?"; ShaolinTiger (@ShaolinTiger) argues that followers should be able to opt out of the sponsored tweets, but not have to unfollow the tweeter; David Lian (@davidlian) asks "Can you purchase conversation?" and argues that advertisers need to become part of the conversation, rather than pushing a message out through paid tweeters.

The symbolic aspect of tweeting, the exchange of pleasantries and informational titbits, is important to consider. Jeremy Woolf in Hong Kong makes an similar point to David Lian in talking about "gunners" who are paid to "seed" forums and the like - the process is like this:
"You identify a forum like Uwants or DiscussHk as an influential channel where discussions relevant to your brand, product or service are taking place. People care enough (or, at least are passionate enough) to share their feelings and ask probing questions. Instead of joining the conversation in a meaningful way by replying to posts or establishing a contributing and helpful role within the community, you instead hire a gunner to spam inappropriate comments at this influential audience." (Dear spammers – can we have our social media back?)

One of key difficulties of social network analysis is understanding the relative meaning of the different ties, and the classification of different types of networks helps in some measure to address this. To become part of a social network means that others need to derive positive meaning from associating with you; that meaning will derive both from their personal reaction, and the interrelated association with commonly valued practices. For the examples of social media, one needs to display commitment, relevance, and integrity. The latter does not preclude being a paid operator with vested interests, but only as long as disclosure is made; the motivation for participation is a central marker of authenticity and integrity.

With this in mind, we may speculate that tweeters may unfollow, or continue to follow, someone for reasons associated with the different types of networks. For behavioural social networks, if the content of the tweets starts being irrelevant to the initial interest, the tweeter will probably end up being unfollowed. So, if a politician starts telling everyone to buy cheap air tickets, he will probably be unfollowed. However, including a certain amount of personal, non-political tweets, is a good way to show commitment to the casual, conversational ethos of Twitter.

For the publicly articulated social networks (and assuming followees will realise when they've been unfollowed), there may be a delicate balance to be negotiated. For a personal friend with whom one has stronger ties, it would probably be easier to unfollow them (and explain via another channel why) than to unfollow someone with whom one has weaker ties, but not weak enough to not care about their reaction altogether.

Conclusions?
OK I've rambled on, and thanks for getting this far. What I'm trying to say is that by understanding what objections people have to sponsored tweets, we may understand more about why people tweet in the first place. It's related to:
1. Both building and maintaining personal networks that operate on meanings developed through relational practices. Social and cultural capital are generated here.
--> For example: I keep in contact with some offline friends, and we develop a mutual understanding about how much to tweet, and what kind of stuff to tweet about. This strengthens our social ties (social capital), and we learn more about each other's preferences and ideas (cultural capital).
2. Developing more functional informational networks directed at increasing cultural literacy and capital.
--> For example: I start to follow various academics in order to have an idea of what they're doing and talking about; I learn new buzz words, read recent articles they tweet about, and so on.

Coming up soon - how do blog networks and Twitter networks differ, and what are the consequences for monetisation strategies?

Social networks and commenting

A recent post by danah boyd (and Bernie Hogan) called Would the real social network please stand up? makes some interesting points about the dangers of assuming all social networks are comparable and concludes
"The truth of the matter is that there is no "real" social network. It all depends on what you're trying to measure, what you're trying to do with those measurements."

She outlines three types of networks:
• "Sociological 'personal' networks": measured in different ways, these would be 'ego networks' with the person in the centre choosing to associate themselves with all the others - e.g. by saying they are people who they would trust with a secret.
• "Behavioral social networks": these would be networks based on common practices. They may be observed but not experienced as important by the people involved, e.g. people taking the same train to work, or they may be more important to the persons - e.g. Grateful Dead fans.
• "Publicly articulated social networks": 'articulated' means that you consciously list them somehow (e.g. your list for Christmas cards), and the public part comes about when you tell others - the obvious example being Facebook 'friends', or blogrolls. These networks may be made of all kinds of people, some of whom may not reciprocate the social tie (e.g. think of an inveterate name-dropper); and these networks may serve different purposes. The symbolism of the ties are important here - i.e. I may friend you as a follow-up to an offline meeting, but there may be no real intention of deepening the relationship.

What are blog networks?
I was thinking - how would you classify blog networks? The common identification of the 'blogosphere' seems to be a behavioural network - from outside bloggers are often bunched together as one group, but from within most bloggers do not identify with the group as a whole. One publicly articulated social network is the blogroll - but there are different views on how useful they are in explaining meaningful ties for bloggers (e.g. Schmidt 2007). Leaving a comment, and responding to them, is a practice central to establishing and maintaining a 'publicly articulated social network' in blogging, but of course not every comment has the same meaning (e.g. see my 10 types of commenters).

I'd agree with Bernie Hogan (aka blurky) that it's important not to 'reify' networks, even though they can be visualised in compelling ways. Blogrolls have limited usefulness, but I would argue that mapping the comments reveals more meaningful relations. Here is an interesting example: at bit less than two years ago, bloggers who had clustered around the launch of All-Blogs, met up in 'Blog House' (Bloggers Allied). This is a mapping of the comments made on blog posts that discussed the blogmeet (the blue squares are blogs that received comments, and the red circles are people that made comments).
malaysia all-blogs blog house sna social network analysis

OK - it's all a bit confusing, but we can focus in on the two blogs with the most comments - who, not coincidentally, were the two major figures there.
malaysia all-blogs blog house sna social network analysis

With hindsight, it's interesting to notice how the two major figures there had little common online commenters - suggesting that their networks have different bases. These two leading bloggers were ostensibly working together towards a common purpose, but after the March 8, 2008 elections there was what was touted as a 'split in the blogosphere' - where they both had a public spat. When I asked some of those involved about the 'split', a common answer I got was: 'there never was a blogosphere - there are all kinds of bloggers, and they can do whatever they want'.

When analysing the social dynamics in the blogging field, it would be useful to think of different types of networks that are enrolled in different contexts: in practice, the networks only exist ephemerally, at the moment of their articulation - the danger of 'reifying' them comes from the ability to trace them on the web, which gives them a misleading permanency. A blogroll link may have been added two years ago, a comment may have been made pretty much at random in any blog.

I think that comments are a fundamental practice of bloggers, and investigating those is more important than - for example - looking at blogrolls or other links; though of course they are relevant too. Too many studies of blogs overlook comments, possibly because: 1) they are more difficult to crawl/mine with automated bots; 2) there is a decreasing rate of significance of comments as they increase in numbers (apart from being an index of the importance of the blog and/or the post) - studies that concentrate on the biggest blogs may therefore overlook them. The way I see it, a blog without the option of commenting is just a website, and analysing blogs without taking account of the comments is like trying to understand the social dynamics of a pub without paying attention to the pub goers.
tweetbackcheck