Four Types of Online Aggregation

Aggregation is the Internet term for harvesting information from a variety of sources and repurposing it into a single presentation.

There are four types of aggregation sites:
•    Human edited
•    Algorithmic selection
•    Social bookmarking
•    Automatic scraping or ingestion

Almost all forms of aggregation mean acquiring content without paying for it. The best kinds of aggregation add value for the reader.

Human Edited
Human-culled headline aggregation sites, such as Drudge Report and NewzJunky.com, are quite popular.

The Drudge Report, operated by Matt Drudge and one assistant, attracts nearly 7 million viewers per month.  NewzJunky.com is run by a single former newspaper employee in Watertown, NY and over the past two years has become the number one online news site for that region of the state, reaching about 67,000 people per month.

These popular aggregation sites do a tremendous job of driving traffic to originating news sources. A link from Drudge can take down servers.  Newspaper Web sites within NewzJuny’s coverage area report a handsome surge of traffic based on a NJ link.

Quirky and idiosyncratic, human-edited aggregation sites give users a sense that somebody who is smart, or shares their beliefs, or is just zealous about the news, is out there looking for interesting links.

In the case of Drudge, especially, his audience is very curious about what he’s linking to, how he’s re-writing headlines, what he chooses to feature where. His loyalists try to read the tea leaves to discern what message he’s sending.  Some critics have even accused Drudge – because of his choice of links and how he rewrites headlines – of conspiring with political campaigns and Fox News.

Scott Karp notes that Drudge, “a site that sends people away with links,” has the highest engagement of any site on the web.

But the most important difference between the top site and all the other sites, is that this top site — Drudge — has nothing but LINKS. … Drudge beats every original content news site by a two to one margin.

Here’s what ReadWriteWeb says about Drudge:

It sends people away to keep them coming back

There’s actually no content on the Drudge Report. Well, sometimes he will post an email or a memo on his site, but it’s 99% links out to other news sources. His site is designed to send you away to bring you back. The more often you hit his site to go somewhere else the more often you’ll return to go somewhere else again. You visit the Drudge Report more because you leave the Drudge Report more. This is one of the secrets to building traffic: The more you send people away the more they’ll come back.

37Signals has called DrudgeReport.com one of the best designed web sites on the web.

To clarify, my definition of design goes beyond aesthetic qualities and into areas of maintenance, cost, profitability, speed, and purpose. However, I still think that the Drudge Report is an aesthetic masterpiece even though I also consider it ugly. Can good design also be ugly? I think Drudge proves it can.

Algorithmic Selection
Some sites – most notably Google News and Techmeme – attempt to mimic human decision-making through computer programming.  Such algorithmic programs make computational decisions based on such factors as popularity of a site, how many external sites link to a particular story, and the popularity of the sites embedding those links.

It’s unclear if this approach is a clear winner in news aggregation.

Google News reached 5.6 million people in January, far below Yahoo! News’s 20 million.   However, even though Yahoo!’s news audience is three times the size of Google’s, Google is a sends nearly as much traffic to a typical low circulation news site.

Part of Google’s value to both reader and publisher is that the news search engine makes it easy to find headlines around defined search terms (the grouping of related headlines is also useful).  Author John Battelle coined the phrase “database of intentions”  to describe how people use Google – searchers have an intention to find specific information at defined times. In a news context, when people are looking for coverage of events, they do so with that intention-driven mindset. In that mindset, they are much more likely to click on a relevant link (much as they would click on an AdSense text ad in Google’s organic search).  This delivers value to readers and benefits publishers.

The Yahoo! News approach to aggregation, however, more closely satisfies the intention of the headline grazer, the person just looking for a quick glance at what is going on in the world or her home town.  The grazer is already in a mindset of “too little time to read too many stories.”

These intention-driven differences likely explains the disparity of click-throughs from Yahoo! News vs. Google News.

Earlier this year, Google launched a localized news service, allowing users to define a headline feed based on zip code.   While it would be tempting to compare Google local to Yahoo! local, the two presentations are very different approaches. Google remains a click-away site, while Yahoo!’s primary mission is to be sticky, offering up users many options to remain on the site rather than follow a link.

Social Bookmarking
Digg is the most popular social bookmarking site on the web. In January, Digg.com reached 24 million people.

Generally, social bookmarking involves site members saving links to a database and then allows other members vote on whether the bookmark is worthwhile. Links are then ranked with the most popular ones making its way to the top of the home page.

Top placement on Digg can bring an avalanche of traffic.

Other social bookmarking sites include Yahoo! Buzz, StumbleUpon, ReddIt, Mixx, Slashdot, Newsvine and Publish2.

Automated Aggregation
Computers can be used to aggregate headlines and links through two methods: Scraping (using a robot server to crawl news web pages) and RSS ingest (grabbing a site’s RSS feed and republishing it).

The most popular automated aggregator is Yahoo! News, which as we discussed earlier, reaches more than 20 million people. Yahoo! uses a combination of site crawling and RSS/XML ingestion (XML ingestion for Consortium member sites) (Also, note, the main Yahoo! News page is compiled by human editors).

While Yahoo! sends newspaper sites a reasonable amount of traffic, as we discussed earlier when comparing Yahoo! News to Google News, Yahoo! appears to refer a mere fraction of its audience to outside sites.  For an analysis of why, see the section on algorithmic aggregators.

Another popular automated aggregation site – reaching 11 million people  – is Topix.  Topix does have human editors in some of the communities it serves; however, it mostly relies on a robot to scrape sites.  It then aggregates the headlines geographically, allows people to comment on the headlines and is also trying to break into local classifieds (An interesting side note: a new site, OurTown.com, is trying to do much the same thing by re-aggregating Topix links).

GateHouse Media sites received a minimal amount of referrer traffic, about 1/10th of 1 percent, from Topix before we demanded the company – owned by Tribune, Gannett and McClatchy – stop aggregating our content (which included at the time full republication of our photographs).