IBM WebFountain: Taking Web search to the next level
Organizations have gained considerable business value from consolidating their structured data into data warehouses and extracting new insights from this data with advanced data analysis tools. The next stage in the quest for information is to extract meaning from the vast amount of information available on Web using Web analytics. In simple terms, Web analytics tools teach computers to read for comprehension and to recognise patterns of meaning in the text of documents. These new tools make the Web a valuable new source for business decisions.
IBM is realizing the fruits of its three-year research project to create such a text analytics system. WebFountain runs on an IBM supercomputer and monitors everything on the World Wide Web. WebFountain contains over a petabyte of storage with over 3 billion pages indexed, 2 billion pages stored and the ability to mine 20 million pages a day.
Web Fountain is not about building a better search engine; it is about identifying patterns, trends and relationships that can be used by businesses to transform the way they work. WebFountain can spot trends in public opinion and popular culture as they emerge and watch them catch hold around the world. WebFountain can be used as a surrogate for public opinion, providing instant, comprehensive virtual market research in the place of newspapers, Web page research or a professional report. WebFountain lets managers keep track of their business environment and quickly understand the marketplace response to their marketing activities. In other words, to answer the John Wanamaker complaint, 'I know half the money I spend on advertising is wasted, but I can never find out which half.'
The first application being trialled with a large customer, addresses a key corporate concern of Reputation Management - how customers and partners perceive the company and its brands. WebFountain tracks and analyses perceptions expressed on the public Web by assessing the sentiment of a posted comment i.e. whether it is positive or negative. Another application, for a major record label, assesses the 'Buzz' on the Web when a new CD is released and relates this to its success in the charts. WebFountain is part of the IBM on-demand infrastructure and comprises three main components:
- The Platform provides the integration between the data and functions of the systems. It is designed with open standards so that new data sources and analytical tools can be plugged in easily. The Data comprises very large stores of unstructured and semi-structured data such as Internet content, Weblogs, bulletin boards, enterprise data, licensed content, newspapers, magazines and trade journals. In a recent agreement, IBM has integrated Factiva's huge resources of paid-for-content with WebFountain.
- Text Analytics Tools. WebFountain supports a broad set of text analytics tools that include natural language processing, statistics, machine learning, pattern recognition and artificial intelligence from IBM and its partners.
- IBM is not the only company planning to exploit the vast amount of information on the Web. Google has recently acquired Applied Semantics for its patented CIRCA technology. CIRCA technology understands, organises and extracts knowledge from Web sites and information repositories in a way that mimics human thought. Applied Semantics' AdSense product applies the CIRCA technology to let Web publishers understand the key themes on Web pages so they can deliver highly relevant and targeted advertisements.
Copyright 2003. Originally published by IT-Director.com, reprinted with permission. IT-Director.com provides IT decision makers with free daily e-mails containing news analysis, member-only discussion forums, free research, technology spotlights and free on-line consultancy. To register for a free e-mail subscription, click here.
For more information:
- Looking for free research? Browse our comprehensive White Papers section by topic, author or keyword.
- Are you tired of technospeak? The Web Services Advisor column uses plain talk and avoids the hype.
- For insightful opinion and commentary from today's industry leaders, read our Guest Commentary columns.
- Hey Codeheads! Start benefiting from these time-saving XML Developer Tips and .NET Developer Tips.
- Visit our huge Best Web Links for Web Services collection for the freshest editor-selected resources.
- Visit Ask the Experts for answers to your Web services, SOAP, WSDL, XML, .NET, Java and EAI questions.
- Couldn't attend one of our Webcasts? Don't miss out. Visit our archive to watch at your own convenience.
- Choking on the alphabet soup of industry acronyms? Visit our helpful Glossary for the latest lingo.
- Discuss this article, voice your opinion or talk with your peers in the SearchWebServices Discussion Forums.