qmail-scanner - "cloud" (or rather mesh)?

Forum for getting help with Project Gamera, Spamassassin, Clamav, qmail-scanner and other anti-spam tools.
faris
Long Time Forum Regular
Long Time Forum Regular
Posts: 2321
Joined: Thu Dec 09, 2004 11:19 am

qmail-scanner - "cloud" (or rather mesh)?

Unread post by faris »

We use 4psa's CleanServer anti-virus product on one of our servers. This uses clamav as the anti-virus engine.

What's particularly good about 4psa's implementation is that it can spread the scanning load accross a number of machines. All that's required is to tell the clamd processes running on other machines to listen on a particular port and to open the firewalls (carefully!) as appropriate.

Has anybody looked into getting qmail-scanner to do anything similar?
I'm not looking at getting the equivalent functinality of the 4psa product, which does some intelligent load balancing and suchlike. I'm envisioning a round-robin system, possibly using nothing more complicated than a bunch of DNS A records, for load spreading.

The advantage of using this mesh or cloud of scanning engines means load is relatively evenly distributes accross machines, and there's also redundancy - if clamav falls over and cannot be restarted on one particular machine, the other machines will take over the scanning for that machine - even for emails received on the machine with the dead clamd.

I'd love to see the same thing for spamassassin in particular. From what I've read it is possible, but from the looks of things it isn't trivial to implement.

Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
scott
Atomicorp Staff - Site Admin
Atomicorp Staff - Site Admin
Posts: 8355
Joined: Wed Dec 31, 1969 8:00 pm
Location: earth
Contact:

Unread post by scott »

Oh yeah absolutely, I played with a lot of configurations early on with this problem specifically.

There are pro's and cons to all of them, and in the end it depends on your resources. I favored the cluster-of-clones approach, since SMTP is a service with distribution built in to the protocol. Basically, thats lots of identical servers (Project Gamera) that are entirely self contained. Offloading to another server is an option in that environment too, but again, I was balancing human resource overhead against available hardware. When you run clamd nodes you are incurring costs elsewhere, in terms of the hardware and human(s) maintaining them.

If the idea is to get insane performance (and I mean exponentially higher.. like 1000x or more) on a *single* server, then what you want is this:

http://www.sensorynetworks.com

That is a FPGA (Field Programable Gate Array), specifically designed for open source stuff. Like spamassassin, clamav, snort, etc. If you ever wondered how you could do gigabit speed packet inspection in something that looks like a 1U dell server, thats the secret sauce :P

Last but not least, there is the distributed object cache model like memcached http://www.danga.com/memcached/. If you're looking for the swiss-army-knife of solutions, this is a great one. You could equally use this for clustering anti-spam servers, or web sites (this is in fact what facebook, and livejournal use to scale). What I like about this is that its a generic abstraction layer, so you could set up a memcached cluster and use it for multiple applications. Like storing bayes data in spamassassin, and your really big social networking website. All you need to do is adapt the application for the environment, and since its open source already, some folks have been building this into their apps out of the box by default.
Post Reply