This content is part of the Essential Guide: Tapping the potential of social media analytics tools
Get started Bring yourself up to speed with our introductory content.

Data mining social media: Twitter's untapped potential

Data mining social media can help businesses gather crucial information. This article gives advice to developers who want to get started.

There's a lot of talk around data mining social media but not much action, according to Matthew Russell, author of Mining the Social Web, Second Edition, a guide to social data collection and analysis. The perceived difficulty of data mining is a main barrier for those interested in getting involved. It's a false barrier, said Russell, because mining Twitter with well-known programming languages -- Python, in particular -- doesn't require advanced developer or data scientist skills.

Mining data in social media helps businesses gather crucial information. Once they know the basics of putting in API requests, analyzing sales trends or code and so on, they can use the insights to fuel innovation, according to Russell, CTO of cloud services provider, Digital Reasoning Systems Inc. This article presents his advice for developers who, in his words, want to get their hands dirty with data. 

Twitter logo

Russell advocates using Python for first social data mining projects because its syntax is simple and its data structure is compatible with textual data. "Most social media properties are going to return data to you in JSON format," Russell explained. JSON (JavaScript Object Notation) is a flexible and intuitive text-based data format often used in Web environments in order to communicate both simple and complex data structures over a network. "Python's core data structures are so close to JSON that there's no real penalty for working with that data. It's very easy to make that request."

The ultimate data mining platform

Every social networking medium presents a value proposition for data mining, but Russell sees no better starting point than Twitter. The simplicity and asymmetric following model of this platform, together with the 232 million active monthly users, make it particularly suited for data mining. Russell likened the app to a busy public street. "You have a lot of chatter and amongst all that chatter there's a signal that can be teased out."

From a developer perspective, Twitter is particularly suited for data mining because of three key attributes.

I think the simplicity of [Twitter, aggregated] with millions of active accounts, creates a lot of value.
Matthew RussellCTO, Digital Reasoning
  • Twitters's API is well designed and easy to access.
  • Twitter data is in a convenient format for analysis.
  • Twitter's terms of use for the data are relatively liberal. It is generally accepted that tweets are public and accessible to anyone, hence the asymmetric following model that allows access to any account without request for approval.

"I think the simplicity of [Twitter, aggregated] with millions of active accounts, creates a lot of value," Russell said. This potential value is largely untapped, however, and Russell believes executives and developers are missing opportunities to uncover important social trends.

Beyond advertising

Twitter's data is almost exclusively used for reputation management, branding and sentiment analysis. In other words, it's advertising. "When you have 232 million active monthly accounts with a fairly high percentage of those active daily, that's where there are some other unique opportunities, as far as social research goes," Russell said.

He described Twitter as an interest graph, an online portrayal of an individual or group's interests. On a small scale, an interest graph serves to predict purchase behavior. On a larger scale, it can be used to analyze societal trends. "If you think of a following relationship as an 'interested in' relationship, which it really is, you have some pretty powerful data in aggregate," he said. When an interest graph operates on a massive scale, its potential for valuable insights begins to stretch beyond advertising. "There's already a body of data so you want to tap into that, not so much to result in an action like selling something, but really to understand what's going on in a market or in a niche area." An example is a hedge fund whose entire trading model was built on interest analysis of Twitter data, which they then used to make smarter investments.

The value of Twitter's API should not be underestimated, in Russell's opinion. The API acts as an entry point that can allow the Twitter platform to enable third-party innovation. "There are an awful lot of creative human beings out in the world who will probably have more good ideas than [Twitter] could ever come up with internally," he said. Twitter's API is an enormous and, as of yet, underused resource. "Anyone can harness that energy and tap into this third-party commodity in order to innovate, from a tiny little startup of one person with a good idea to a much larger corporate entity with dozens of software developers."

The potential of Twitter's self-organizing, ever-growing pool of data offers direct insight into trends and interests on both a personal and collective scale, but it has yet to fully capture the imagination of developers. And, in terms of the value that could come from data mining social media, Twitter is just the tip of the iceberg. Russell predicts and hopes that businesses will begin to treat ads as a means to an end, and that they will find the true value in innovating with social media data.

Follow us on Twitter @SearchSOA and like us on Facebook.

Dig Deeper on Application development and management

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Do you think data mining social media has business value beyond advertising?
Many other areas could utilize social media insights like product/service performance design, evaluations and new products and services ideas.
Because it gives a lot of information to innovate the service, products and process.
Helps us understand customers and their preferences better and can help aid context based service provisioning for them to improve customer service
Trending analysis is probably one of the more useful things one can do with Twitter's data, and since they released the Streaming API, it can be done in real time, which is just awesome.
Good point, Carlos. For those who want more info: 
Plenty of uses for this data for the enterprising analyst, from journalism coverage to important health studies. 
Absolutely - intelligence can be gathered on security including malware, improper employee usage, leakage of trade secrets, emerging threats, and more. The NSA has a great model for this.
Twitter's data is certainly useful beyond advertising, and this is why I think that controlling and refining access to this information is what will ultimately be the moneymaker for the company.