I've been working on a little side project and found the need to filter data into "good" and "bad" types of data. After a bit of research, I settled on giving a simple Bayesian filter a try. I essentially modelled my approach off of what I had seen in spam arena since the ideas about good/bad data were similiar (though my data includes both words and numbers).
Well let me just say - cool stuff. Surprisingly easy to implement and once you get them trained, they do a very good job. I've trained my filters on about 1000 pieces of data and so far, the filter is able to correctly filter out the bad data at about a 90-95% rate, which is more than good enough for my scenario.
I read a quote somewhere once that said Google used Bayesian Filters like Microsoft used if-then statements. Well, if true, that is a scary thought now that I have experienced them first hand.