PDA

View Full Version : Spam filter (this one actually works)


Nav
04-08-2003, 10:45:52
It comes a number of forms. I'm using the Outlook plugin. I found this after I missed an email for an interview that was gobbled up by my erratic old spam killer (Spam Inspector)

So far it's very effective, definately worth a look!

http://spambayes.sourceforge.net/

btw, SP it's written in Python...

Asher
05-08-2003, 22:28:00
You said Spam Inspector was awesome before, now you're saying this one is.

Hah.

Nav
05-08-2003, 23:19:03
Ignore whatever I said before (I actually said awesome?!), this works really well.

You have to train it to tell it what spam is, and what is 'ham' (good mail). It also puts stuff it's unsure of in a maybe folder, so you can decide. Doesnt rely on 'Friends' or 'Enemies' lists, updates or anything like that.

Did I mention it was free and opensource? ;)

Sir Penguin
05-08-2003, 23:36:52
I heard about something called greylisting a long time ago, but I've completely forgotten the details of the algorithm. Apparently it's a kickass spam filter.

SP

No longer Trippin
06-08-2003, 03:50:53
Every good filter is good until spammers have time to dissect it.

Nav
06-08-2003, 10:50:00
It's based on what you think is spam, not a general consensus.

No longer Trippin
06-08-2003, 23:16:12
Yeah, but they'll find ways around it - the only surefire way is to have a list of contacts - which then severely limits the usefulness of email as you can't just use it casually anymore - have to give permissions and such. Then have it dump everything else or stick it into a maybe folder - which would most likely allow you more freedom with email, though less than currently - but spammers will just learn to exploit the maybe folder - thus that will be rendered useless.

Nav
11-08-2003, 16:49:25
It shouldn't be rendered useless.

Because you specify the kind of email that you like, spammers cannot get round it (except by rare chance I guess).

It's picking up about 98% of spam, the other 2% goes into maybe. I don't think it has mis-identified a genuine email (ie bunged straight into the spam folder), since I installed it.

This has been the most hassle free and successful spam filter I've tried so far. Definately think you should give it a try.

Deacon
11-08-2003, 20:25:33
I think the best spam filter is to deliver an electric shock to the spammers until they stop.

No longer Trippin
12-08-2003, 03:13:35
Have you checked the spam folder to see if it has thrown out a good email?

I would prefer just handing them over to Venom, it would be cheaper, and they wouldn't be able to send an normal email, let alone spam as they'd be under his new driveway.

Darkstar
12-08-2003, 08:10:22
:lol:
I agree with Trip!

Scabrous Birdseed
12-08-2003, 08:59:38
Since I get a spam message about once every three days I'm not gonna bother.

Qaj the Fuzzy Love Worm
12-08-2003, 19:53:41
I get almost no spam (that said, please don't sign me up for any for 'a bit of a laugh'), but I've heard that this is one of the (currently) most effective solutions available right now.

However I also agree with Trip and Darkstar to a degree, which is scary. You still have to go through your greylist, which is a pain, but eventually you're going to whittle that down so that it's not the gigantic problem it once was - witness Nav's claim of killing 98% of the spam. Right there it only takes 1/50th of the time to sort through the remainder than it would have otherwise.

I've seen SpamBayes reviewed a few places on the Net and have yet to see a bad review (by people who know the drawbacks of regular spam filters, too). I also have an inkling into the processes it uses. I would recommend it to anyone who's having a large problem with spam.

Sean
12-08-2003, 21:26:25
An article about these filters from Paul Graham: http://paulgraham.com/ffb.html

Darkstar
12-08-2003, 21:53:31
In a nut shell:
Bayes is using statistical compares to figure out where to put something.

You tell it several different messages that has "Bigger penises" in it are spam, and in the future, it says phrase "Bigger penises" = 100% spam. Therefore, put in spam folder.

You tell it 3 messages from zmama@lost.in.lesser.hell.net are all good, and a new zmama@lost.in.lesser.hell.net comes in, and it says "that sender is 100% good". Therefore, put in good folder.

It sees a new message from zmama@lost.in.lesser.hell.net with the phrase "Bigger penises" and it goes... er... I don't know. Therefore, put in maybe.

It just models doing what you show it. The more you show it, the better it imitates your kill or keep behavior.

Sean
12-08-2003, 22:38:58
Er, yes. That would explain the previous article, the one that (AFAIK) was the inspiration for SpamBayes. All the fun stuff was about how this changes spam, and opens up the possiblity of ‘innocently’ DDOSing their sites.

Darkstar
12-08-2003, 23:02:35
There should be 0 chance of DDoS attacks, innocent or otherwise. Because this process fails as soon as it starts comparing everyone's mail habits (and there is "critical mass" of people's data). The reason for that is most people would find ads for golf shoes SPAM, but you might be interested in that. Since the majority find it SPAM, your spam filter, going with the group, would be useless to you. That's the complaint of coordinated spam filtering.

No longer Trippin
12-08-2003, 23:44:34
I just have a junk account that I use for everything pretty much and a private account which only gets a couple pieces a day, so it's not an annoyance as they are painfully obvious. If it's not from a sender I know, deleted - especially if there is an attachment on it, then I don't think twice because most likely someone I've lost touch with isn't going to get back in touch with me to send me a bloody attachment. Now on my junk account, I only access it if I'm expecting a mail, and I know the address it'll most likely be from, so it doesn't take long to find it, look to through the first few (if within the hour) and it's there.

Sean
13-08-2003, 00:11:51
Darkstar, one question: did you read the article?

Darkstar
13-08-2003, 02:17:43
Links.

No longer Trippin
13-08-2003, 06:58:59
Qaj: What part did you agree with, handing them over to Venom, or spammers eventually finding other damned loopholes?

Sean
13-08-2003, 11:05:01
Originally posted by Darkstar
Links. Still annoyed that I found dissenting links to your opinion, so asked for the ones you were talking about?

Qaj the Fuzzy Love Worm
13-08-2003, 15:23:12
Trip: Both :) Though I think in the long term the Venom solution will be the most effective :)

I heard talk on NPR about some standards body redesigning email from scratch to be "secure". It'll be insteresting to see them try to get people to adpot it once it's done.

MDA
13-08-2003, 17:58:33
Easier than creating an INTERnational "no spam" list to sign up for, I'll bet.

Darkstar
13-08-2003, 19:23:15
Are you kidding? An international "no spam" list would be hacked and used on those "5 trillion good email addresses for sale cheap!" CDs!

Sean, Links.

You never do your own research, and you do not provide any data or links to back up your claims, and yet the first thing you say is "links". So provide your links first, Polyboy. Mad? No. I'm just bored with you.

There's Google, Alta Vista, and literally a few dozen more search engines. You claim "I can never find any such things, just the opposite". Well, provide your links then, Link-Boy. Personally, I think you are just too damn lazy to go find out for yourself. Therefore, you aren't worth me wasting any more of my time on you.

Here's something for you to go check out, link boy: if you have been paying attention, SCO has just announced the termination of IBM's second Unix System 5 license. The license IBM gained ownership when it bought Sequential. It was IBM releasing Sequential's developed RCU (Read Copy Update) and NMU (nonuniform memory access ) into Linux that are 2 of the 5 released to the public major areas of technology that IBM is accused of dumping into Linux without SCO's permission that are being used as evidence of IBM's breach of contract, as well as several other nasty things in the upcoming trial case. IBM publicly admitting to doing exactly that (dropping Sequent's UNIX NMU and RCU code into Linux), but oops! It had forgotten that SCO owns control of what is and isn't permitted to be moved from UNIX code to anything else. That's why IBM is using the patent counter-claims on SCO... because it owns patents related to RCU and NMU.

If you cannot find that in your searching on the SCO-IBM issues, it just shows you aren't bothering, or your search skills are really, really atrocious.

Now, go do your own homework and catch up on the subject if you want to play at the big boys table. I have enough lazy co-workers that cannot be bothered to open their own email or answer their own phones or even to show up at meetings to deal with in RL, let alone do their own thinking, learning, research, or work.

Darkstar
13-08-2003, 21:24:19
You know, Sean might not deserve all that (as he does seem to be up to date, willing to share, and willing to dig on footie). But he just reminds me of lazy, idiotic co-workers, and that is currently setting me off. I cannot wait for the contract change over and the following re-org to get a new set of deadwood to work with...

Sean
13-08-2003, 21:25:59
DS, what does this have to do with the article that you apparently responded to without reading in this thread?

I continue to read about SCO-IBM, and I continue to be unconvinced either way. Red Hat, for instance, have countersued without anything to do patents (http://www.zdnet.com.au/newstech/os/story/0,2000048630,20276835,00.htm) while IBM’s countersuit does not solely rest on the patents, but also brings the GPL to court (http://www.nwfusion.com/news/2003/0808linuxgpl.html) for the first time. The license termination is basically irrelvant for now, as IBM is ignoring them until the courts have heard it (http://www.crn.com/sections/BreakingNews/dailyarchives.asp?ArticleID=43908). When SCO does something other than letter-writing and litigating things might become a bit clearer.

No worries, in other words :). But I really am curious about this article.

Darkstar
13-08-2003, 21:44:52
The GPU has been in court several times already. However, all 4 of those previous times, the cases settled out of court. So this isn't the first time the GPU has been called into question, or brought up to need a ruling. But no legal ruling on it has actually been passed.

The Red Hat versus SCO is a "Stop saying bad things (without legal proof) and driving off our customers!" matter. I expect it will get delayed until the base claims that SCO are ascerting in regards to IBM, are resolved. SCO's defense will hinge on that, after all.

IBM's swap to a patent claims is important... Patent law is a different matter from IP law and Copyright law. They will be able to drag the GPU in as the agreed upon license by which SCO is allowed to use their patents, and try to force SCO to settle. SCO will then "own" Linux (as IBM will have legally agreed it put SCO IP as well as IBM Patents into Linux), but IBM will effectively be "cross-licensed" and immune from being held liable, as will its products (and therefore its customers using its products). That won't help Red Hat (or anyone else on the line) out, but it covers IBM's ass...

And it is just a test to see if you can FIND the articles on this matter, Sean. As you have been saying, "I cannot find no such thing. Just the opposite." It's a simple test of your Net searching skills. I thought that was clear. :)

And no, I haven't read the articles. I opened the first two links, saw "Bayen spam filtering", did a quick scan, and closed them. I've read dozens of those, so why waste my time doing so? As I've said, you cannot get an accidental DDoS out of a functional Bayen spam filter. It either runs on your local data folder, or as a stand in kill file between your machine and your inbox server. So how is it going to DDoS? It can only do that if it doesn't run local, or stay local between you and your inbox. So to get DDoS, that implies sharing outside that set. If it's sharing your statistics outside, then it's probably a "mass mind bayen" (that's what they do), and if that's true, it is going to fail being useful when more people do not read the kind of mail you read... which will happen with a big enough data gathering group. if that isn't what you are talking about, quote your sources and link to it, link boy! :)

Sean
13-08-2003, 21:55:31
That just proves you haven’t read the article, as it called Filters that Fight Back and follows on from Bayeseian filtering.

In the extreme case they will be reduced to what I call "spam of the future", a little plain text plus a url:

Hey there. Check out the following:
http://www.blackboxhosting.com/foo

If the spam is waiting on the site, why not have filters go look at what's there? You could apply the filtering algorithm pretty much unchanged to the contents of the site.

If popular email clients did this in order to filter spam, the spammer's servers would take a serious pounding.

That is a very excerpted version of part of the article. Read the full thing to get the full story.

Darkstar
13-08-2003, 22:00:21
Ah. That is different. Ok, I'll take a look at it then. Thanks.

Darkstar
15-08-2003, 05:42:36
Ok. I've seen that. I just don't associate it with Beyan filtering. Most email security is out to NOT hit the servers... because it lets the dictionary attackers know that you are a good account. And dictionary identity is becoming the top way of addressing spam mail.

Darkstar
15-08-2003, 05:44:03
Also, that is considered "vigilantism" software, and is considered grey to illegal... depending on the experience of your local courts.

Darkstar
15-10-2003, 17:07:59
BTW... I'm going to try this out at work, and see how it does. I get a lot of "spam" there... from security lists I'm on, telling me that some AV company has added a particular virus variant sig into its latest daily/beta updates. But that list sends me occasionally informative things, so I don't drop the list entirely.

Asher
15-10-2003, 22:37:16
The spam filter in Outlook 11/2003 kicks total ass.

I get about 20 spam a day and this misses about one or two a week, and I've never had any false positives.

Spartak
16-10-2003, 18:32:23
Just installed the thing. I'll post how I get on with it.

Spartak
21-10-2003, 20:48:37
No trouble at all and it works really well.

Sean
21-10-2003, 21:12:52
Originally posted by Asher
The spam filter in Outlook 11/2003 kicks total ass.

I get about 20 spam a day and this misses about one or two a week, and I've never had any false positives.
I get an equal amount of spam, and get better results using naive Eudora filters :confused:. Maybe I have lower-level spammers?

Darkstar
22-10-2003, 03:36:01
Well, my work SpamBayes has needed a good bit of training. But then, I want to see the new security bulletins, but not the "This is a statement to alert you that McAffee has released an update that detects virus XYZ". But I'm spliting hairs on the security spam.

All the OTHER work spam I'm getting, it's doing great on.

Nav
24-10-2003, 15:10:31
I read that the inbuilt spam filter in the latest version of Outlook is a bit erratic.

Not based on bayesian principles then. ;)

Noisy
26-10-2003, 09:31:33
I use Netscape as my newsreader. After two weeks, it was picking up all spam as junk, and there have been no false positives or negatives over the last week or two.

Darkstar
18-11-2003, 01:53:27
Update on my work experiment:

Not so good still. Still training. It hasn't ascertained, statistically, what is spam and what isn't, from the security list. And there is lots of spam...

Asher
18-11-2003, 01:58:35
Originally posted by Nav
I read that the inbuilt spam filter in the latest version of Outlook is a bit erratic.

Not based on bayesian principles then. ;)
Works great on the computers I have it on.

And yes, it's based on Bayesian principles. MS Research has papers on it dating back to 1996. :)

Darkstar
18-11-2003, 05:27:12
When I saw them say "Smart Recognition" and "learning", I figrued it was Bayesian. So it's more of a matter of what you are filtering on... if you have material that very similar, but one is spam and the other ham, then it's not going to do a decent job. Otherwise, it will do well.

Darkstar
11-12-2003, 08:13:37
Update...

At work, this is doing a wee bit better. Took a lot of training though.

However, I'm tired of having to constantly create new kill rules for my current at home spam killer (McAffee's SpamKiller), so I'm going to install it at home, and see how it does.

Spartak
16-12-2003, 20:01:01
well its been good for me except a tendency to add new e-mailers to the junk suspects folder. :rolleyes: