Four Types of Online Aggregation

Aggregation is the Internet term for harvesting information from a variety of sources and repurposing it into a single presentation.

There are four types of aggregation sites:
•    Human edited
•    Algorithmic selection
•    Social bookmarking
•    Automatic scraping or ingestion

Almost all forms of aggregation mean acquiring content without paying for it. The best kinds of aggregation add value for the reader.

Human Edited
Human-culled headline aggregation sites, such as Drudge Report and, are quite popular.

The Drudge Report, operated by Matt Drudge and one assistant, attracts nearly 7 million viewers per month. is run by a single former newspaper employee in Watertown, NY and over the past two years has become the number one online news site for that region of the state, reaching about 67,000 people per month.

These popular aggregation sites do a tremendous job of driving traffic to originating news sources. A link from Drudge can take down servers.  Newspaper Web sites within NewzJuny’s coverage area report a handsome surge of traffic based on a NJ link.

Quirky and idiosyncratic, human-edited aggregation sites give users a sense that somebody who is smart, or shares their beliefs, or is just zealous about the news, is out there looking for interesting links.

In the case of Drudge, especially, his audience is very curious about what he’s linking to, how he’s re-writing headlines, what he chooses to feature where. His loyalists try to read the tea leaves to discern what message he’s sending.  Some critics have even accused Drudge – because of his choice of links and how he rewrites headlines – of conspiring with political campaigns and Fox News.

Scott Karp notes that Drudge, “a site that sends people away with links,” has the highest engagement of any site on the web.

But the most important difference between the top site and all the other sites, is that this top site — Drudge — has nothing but LINKS. … Drudge beats every original content news site by a two to one margin.

Here’s what ReadWriteWeb says about Drudge:

It sends people away to keep them coming back

There’s actually no content on the Drudge Report. Well, sometimes he will post an email or a memo on his site, but it’s 99% links out to other news sources. His site is designed to send you away to bring you back. The more often you hit his site to go somewhere else the more often you’ll return to go somewhere else again. You visit the Drudge Report more because you leave the Drudge Report more. This is one of the secrets to building traffic: The more you send people away the more they’ll come back.

37Signals has called one of the best designed web sites on the web.

To clarify, my definition of design goes beyond aesthetic qualities and into areas of maintenance, cost, profitability, speed, and purpose. However, I still think that the Drudge Report is an aesthetic masterpiece even though I also consider it ugly. Can good design also be ugly? I think Drudge proves it can.

Algorithmic Selection
Some sites – most notably Google News and Techmeme – attempt to mimic human decision-making through computer programming.  Such algorithmic programs make computational decisions based on such factors as popularity of a site, how many external sites link to a particular story, and the popularity of the sites embedding those links.

It’s unclear if this approach is a clear winner in news aggregation.

Google News reached 5.6 million people in January, far below Yahoo! News’s 20 million.   However, even though Yahoo!’s news audience is three times the size of Google’s, Google is a sends nearly as much traffic to a typical low circulation news site.

Part of Google’s value to both reader and publisher is that the news search engine makes it easy to find headlines around defined search terms (the grouping of related headlines is also useful).  Author John Battelle coined the phrase “database of intentions”  to describe how people use Google – searchers have an intention to find specific information at defined times. In a news context, when people are looking for coverage of events, they do so with that intention-driven mindset. In that mindset, they are much more likely to click on a relevant link (much as they would click on an AdSense text ad in Google’s organic search).  This delivers value to readers and benefits publishers.

The Yahoo! News approach to aggregation, however, more closely satisfies the intention of the headline grazer, the person just looking for a quick glance at what is going on in the world or her home town.  The grazer is already in a mindset of “too little time to read too many stories.”

These intention-driven differences likely explains the disparity of click-throughs from Yahoo! News vs. Google News.

Earlier this year, Google launched a localized news service, allowing users to define a headline feed based on zip code.   While it would be tempting to compare Google local to Yahoo! local, the two presentations are very different approaches. Google remains a click-away site, while Yahoo!’s primary mission is to be sticky, offering up users many options to remain on the site rather than follow a link.

Social Bookmarking
Digg is the most popular social bookmarking site on the web. In January, reached 24 million people.

Generally, social bookmarking involves site members saving links to a database and then allows other members vote on whether the bookmark is worthwhile. Links are then ranked with the most popular ones making its way to the top of the home page.

Top placement on Digg can bring an avalanche of traffic.

Other social bookmarking sites include Yahoo! Buzz, StumbleUpon, ReddIt, Mixx, Slashdot, Newsvine and Publish2.

Automated Aggregation
Computers can be used to aggregate headlines and links through two methods: Scraping (using a robot server to crawl news web pages) and RSS ingest (grabbing a site’s RSS feed and republishing it).

The most popular automated aggregator is Yahoo! News, which as we discussed earlier, reaches more than 20 million people. Yahoo! uses a combination of site crawling and RSS/XML ingestion (XML ingestion for Consortium member sites) (Also, note, the main Yahoo! News page is compiled by human editors).

While Yahoo! sends newspaper sites a reasonable amount of traffic, as we discussed earlier when comparing Yahoo! News to Google News, Yahoo! appears to refer a mere fraction of its audience to outside sites.  For an analysis of why, see the section on algorithmic aggregators.

Another popular automated aggregation site – reaching 11 million people  – is Topix.  Topix does have human editors in some of the communities it serves; however, it mostly relies on a robot to scrape sites.  It then aggregates the headlines geographically, allows people to comment on the headlines and is also trying to break into local classifieds (An interesting side note: a new site,, is trying to do much the same thing by re-aggregating Topix links).

GateHouse Media sites received a minimal amount of referrer traffic, about 1/10th of 1 percent, from Topix before we demanded the company – owned by Tribune, Gannett and McClatchy – stop aggregating our content (which included at the time full republication of our photographs).

Giving your newspaper content away for free online is foolish

Giving your newspaper content away for free online is foolish.

It does indeed cannibalize your circulation.

Qualification: I’m speaking only for local newspapers who’s community focused content is unique and generally valued by only a narrowly defined audience.  For news organizations with national or international aspirations, different rules apply.

Here’s the conundrum for local newspapers — giving newspaper content away for free isn’t a successful strategy, charging for it online won’t work, and not using the web to grow your business is suicide.

That makes it seem, then, like newspaper publishers have no option. If they give their content away for free online, they’re helping to kill their print business; if they don’t have a news web site, they risk losing their entire local news franchise (to an online-only start up) and they also abandon the one avenue they have to generate new revenue and grow the business.

What no newspaper publisher has considered, as far as I know, and at least not in a long time, is a third way.  Rather than giving away content, or charge for it or not even having a web site, the third way is create an entirely different web operation.

Let me state the obvious: The web is not print. Content publishers online require a completely different mindset from print journalists. The people who produce content for the web should not be the people who produce content for print. (Not that print people aren’t smart enough to learn web publishing — they certainly are, but they’re too concentrated on print when that’s their primary livelihood).

An online news site needs to comply with the following criteria:

  • Continuously updated
  • Use of multimedia
  • Personal-voice writing
  • Community building
  • User customization
  • Web strategy designed around pull rather than push
  • A separate, online-only sales staff with no constraints

There’s a lot of money to be made for local news sites if they can build strong, loyal online audiences and generate a buzz among readers and advertisers about what they’re doing, but unless and until newspaper publishers start seeing more clearly that the web is not print, their local news franchises are likely doomed.

Real name policy on

When I look at the names of the people who have already registered for the new, I see nothing but a list of friends.

And that’s part of what I want for the resurrected  I want this to be a site where people feel safe to discuss whatever issue I happen to introduce in a blog post.

I’m done with trolls.

I’m done with anonymous posters.

On a web site where the expectation is we attract an audience of mature, professional adults, the notion that all participants contribute under the byline of his or her real name shouldn’t seem obscene or unexpected.

As a matter of ethics, I believe anybody in the information business should never, under any circumstances, hide behind a pseudonym.

The real name policy was mocked in comments on Dan Kennedy’s blog. I figure such derisive remarks come from people who somehow just managed to graduate from their AOL account 18 months ago.  I’ve been running online communities for well more than a decade.  I’ve learned a few things. As arrogant as it sounds, I’m not taking lessons from neophytes.

Of course, the question naturally arises: How will I enforce a real name policy? And my only answer is, as best I can.

Basically, if you’re a troll, you won’t last long on my new site.  If you engage in personal attacks against me or other people leaving comments on the site, you will be blocked.

Fake names are generally pretty easy to detect, and since it’s my site, I don’t need proof. I only need suspicion.

Does that mean I’ll delete comments just because a person disagrees with me? Of course not. I happen to love a good discussion over differing views.  But I know it’s also possible to disagree, as they say, without being disagreeable.

Basically, my expectation and what I intend to do enforce as best I an, is that discussions on stick to issues and aim at being instructive.

If my draconian rules mean fewer people will comment, I’m willing to live with that.

And if you think you should be able to spout off whatever bullshit you please without attaching your real name to your opinions, then isn’t the place for you, and I don’t care if you don’t like it. There’s always where any person can rant to his chickenshit anonymous heart’s content.

Ethical people, honest people, always use their real names.

Making a go of it with The Batavian

Yes, I’m no longer employed by GateHouse Media.

But I do have a big job ahead of me: running The Batavian.

I’m grateful to Mike Reed, CEO, Kirk Davis, COO and my boss Bill Blevins for all of the opportunities afford to me by GateHouse.  I learned much, grew much and was given the freedom to do many interesting and worthwhile things.

I’m also very excited the opportunity with The Batavian.

Requiem for the Rocky Mountain News

Final Edition from Matthew Roberts

My father was born in Colorado. One of my brothers lived for years in Aspen. He now lives in the Denver area, as does another brother.

In my youth, I visited Colorado a handful of times. As an adult, a few more times still.

When I first settled on journalism as a career, I dreamed of writing for the Rocky Mountain News. I was captured by a faint romantic notion that I could find myself as the lone reporter for the Rocky in some remote Colorado town. I can’t even say for sure if the Rocky had such bureaus back then.  

I think I applied once for a job at the Rocky. I don’t recall getting a response.

Years later, I wound up at the Ventura County Star, also an E.W. Scripps newspaper.

I’m proud of my time at Scripps. It was a great work environment. I was treated well and given every opportunity to grow, learn and advance my career.

I still feel part of the Scripps family and some of my best friends in the industry still work for Scripps.

While I was at the Star, a couple of reporters transferred from Ventura to Denver.  It hardly seemed like a bad idea at the time. The Rocky was a big step up — a larger paper in a bigger city and a national reputation.  The Rocky seemed as venerable then as the mountains its named after.

I’m thinking of all my friends at Scripps today. I’m sorry to see the Rocky go. It’s a loss for the company, for the communities it served for nearly 150 years and for the hardworking journalists past and present who worked dedicated themselves to producing a world class newspaper.

Previously: The Founding of the Rocky Mountain News

Video: Howlin’ Wolf, How Many More Years

Maybe my favorite aspect of YouTube is its role as an archive of great, old music videos.

Here’s Howlin’ Wolf performing "How Many More Years."

*Note, this post represents to things: Testing video embedding on my new Drupal set up; second, demonstrating that my future blogging on this site may not be just about newspapers, online media, etc.  I’m thinking — once I get the Word Press transfer issue figured out — that I’ll import my posts from into this site as well.

Registration on the new site

I’m still working on putting this new site together (and I’m still looking for help on getting the archives set up), but a couple of people have already registered.

That’s great, but I didn’t have the registration form configured yet.

Now it is. So it seems like a good time to mention: I’m going to require real names to comment on my blog.

All of my participation on the Web is under my real name. I never leave anonymous comments.

There are places and times where anonymity is appropriate. I’ve decided that my blog is not such a place. As the owner, I figure that’s my right.

Setting up a new blog in Drupal

I’m setting up a new blog in Drupal.

My biggest challenge is trying get my WordPress 2.5 blog archives converted to Drupal 6.

Apparently, none of the available scripts for such a conversion are up to speed with that migration path yet.

Anybody out there want to help? Meanwhile, I’m planning some blog posts … not sure when or what I’ll write about first — but lots on my mind.