Mining twitter data June 3, 2009
Posted by inukonda in Uncategorized.trackback
Everyone is talking about “real-time” search and data and twitter being a good medium for that. Fred Wilson even posted an article on the potential of Twitter being a substitute for Set Top Box Data. While I love twitter and use it religiously, I am going to go out on a limb and say that we are far from taking meaningful revenue generating actions from an enterprise/business standpoint based on the data mined from twitter streams. Here are a few reasons for my premise:
1. Based on a recent HBS study, about 10% of the population on twitter is responsible for 90% of the content. This indicates that twitter can excel in acting as a collective real time polling mechanism for 10% of the population. The accuracy of these results depends on how close this collective group is to the broader population and I suspect that it is not. It is almost like a prediction market and the reasons for the failure of prediction markets are pretty similar.
2. There is no measurement. Without a meaningful way to measure, the data is not going to be highly reliable. CPC is useful as we can measure the clicks. Measuring consumption is critical. Similar to click fraud, will we start seeing tweet-fraud?
3. I cannot measure many-to-many engagement using twitter. I can measure the one-to-many engagement. But if I were able to measure the many-to-many, this would be killer data to have.
4. From a BI perspective, the data in twitter is pretty unstructured. How do I take the unstructured data and convert it into a structured form that can be interpreted? How do you measure emotions and sentiments?
Bottomline, I think mining twitter data is going to be a major thing going forward. But in the short term, it is going to be one of the many sources that companies use for BI/MI type activities. (and also not a major one). I do believe that down the road, twitter data will be a reckoning force in the BI space.
Wouldnt the same transformational techniques which are used by today’s top tools (microstrategy, Hyperion etc) be used to transform this unstructured data ? What difference do you think twitter introduces which today’s BI mechanisms dont solve ?