Anthony Bailey ([info]anthonybailey) wrote,
@ 2007-05-25 21:57:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:software_development

A Feedsum Engine

A couple of weeks ago I noticed that I was avoiding subscribing to some informative feeds because they were a bit too informative. I don't want to see as many as a dozen separate news items across the course of a single day, because it makes too much noise in the feed aggregator and gets in the way of rarer, more valuable entries. (In my case, the aggregator takes the form of my LJ Friends page.)

Feed icon I realized I wanted the equivalent of a digest e-mail for feeds - a daily summary or similar. This sounded like a job for a web service, so after failing to find any existing solution, I registered feedsum.com and coded up a little Rails app that sucks in a source feed, collates items into daily batches, and generates a derived feed in which each item summarizes all the source items for one day.

At first this was simply a stateless service, but I found that source feeds often dried up very quickly - by the end of the day, the earliest items could already have been pushed out. So now I stash source items in a database whenever the source feed is pulled. I also found that syndicators such as LiveJournal were not sufficiently patient to wait for my app to spin up in clunky development mode, starting up a fresh Ruby before pulling the source and generating a summary - they often timed out before the service was done. So finally I have had to learn how to deploy a proper production Rails app using mongrel proxies and all that. There were some fiddly one-time set-up details, but once up and running Capistrano is pure deploy joy. Feed icon

Currently I'm using the service to source three LJ syndications: [info]slashdot_daily, [info]machinifeedaily and [info]arstechnicdaily. The service can probably handle a further light load yet so others (LJ users or any other feed consumer) are welcome to give it a beta test. It's easy to use: to get a daily summary of some feed

http://example.com/path/index.xml
simply request
http://feedsum.com/daily/example.com/path/index.xml
(You can do other neat stuff like http://feedsum.com/every/8/hours/... and so forth but the daily summaries tend to be the useful ones.)

Do try it out and please tell me if e.g. it completely chokes on any source feeds. (And to avoid a rush of dupes, could anyone who uses it to register further LJ syndication accounts add a link in a comment below.)




(4 comments) - (Post a new comment)


[info]pauldoo
2007-05-26 10:18 am UTC (link)
I'm using "feed://feedsum.com/daily/arstechnica.com/journals.rssx" in my Google Reader now, appears to work great. :)

(Reply to this)

Items without publication dates
[info]anthonybailey
2007-05-26 12:22 pm UTC (link)
Mike Moran found a feed that the service currently 500's on: the items in http://www.citeulike.org/rss/ don't appear to have publication dates, so as currently implemented feedsum can't group them into days. I'll at least improve the feedback for this case.

(Reply to this)


[info]anthonybailey
2007-05-26 12:29 pm UTC (link)
(Since a few others are using this, I better start noting updates somewhere. I'll tie something to the changelog at some point - but for the moment I can use comments on this post.)

Shortly after publicising the service, naturally I introduced a bug. The whole point of feedsum is to make the source feed less noisy, and so the service is supposed to ignore new items until the end of the day, so that the summary feed only updates once per day rather than every time the source feed does. For the first half of today, the logic to check for this was broken, and the summary will have been updating too frequently. Sorry! Fix is now deployed. I won't do any more tinkering until I've got my tests in place (the shame of it!)

(Reply to this) (Thread)

Dev diary feed now available
[info]anthonybailey
2007-05-26 05:20 pm UTC (link)
OK, my development diary is now available on http://feedsum.com/diary/

It's a feed, of course. (But please don't ask feedsum.com to summarize it, since - at least until I get some load balancing going - that will lock the lone feedsum mongrel server!)

(Reply to this) (Parent)


(4 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…