February 10, 2008   Sign In |  About ebizQ |  Contact Us |  Join ebizQ Gold Club
Peter Schooff
Peter Twenty-Four Seven Security
Peter Schooff's blog is a daily look at what's going on in the world of computer security with an emphasis on how it affects businesses.

« MailEnable Enables Spammers | Main | Microsoft Speeds Out Security Fix »

March 29, 2007
Podcast: Why Is Image Spam Flooding Our Inboxes? A Discussion With Commtouch

Listen to or download the entire 10:39 podcast below:


Download file

What follows is a summary of my discussion with Amir Lev, the President and CTO of Commtouch, concerning image spam: what it is, why it exists, how to stop it, and what he believes will be the next wave of threats to hit our inboxes.

What exactly is image spam?

A lot of spam messages contain images, but the term image spam refers to messages that have all their spam content contained in the image. While in some cases spammers will add random text to the message under the image or above, this is designed to fool anti-spam filters. But usually we refer to the image spam as spam content that is entirely in the image.

When did spammers first start using image spam and why?

Images have been contained in spam for a long time but this type of image spam that we relate to started in 2005. We’ve seen the first seedings and tests done by spammers beginning in 2005. At the end of the year during Nov. and Dec. of 2005 they started a massive distribution of image based spam. During the beginning of 2006 it came up to the level that it is now at which is about 20 or 25 percent of all spam.

Why is most image spam telling me to buy penny stocks?

Most products and services that are promoted via spam contain hyperlinks because they want you to go online and register for a service or download something or buy something. Once they contain a hyperlink, it’s easier for anti-spam filters to track them down by the hyperlink, or the URL contained in them rather than by the image. So, if you didn’t have an anti-spam filter, you would probably see a lot of image spam that is not only promoting penny stocks, but it’s the penny stocks that does not contain hyperlinks and only wants you to buy stocks, so this type of spam is getting through the filters where the other are stopped.

Why has image spam been so successful at getting through anti-spam filters?

Mainly because most of the anti-spam filters are content related. One way or another they are looking for the lexicode analysis, or trying to look for words or symbols like mortgage or Viagra, etc, in order to know whether a message contains spam. But once all the text is converted into an image, those filters cannot cope with the image anymore.

What are some of the problems that image spam creates?

The first and biggest problem is less detection rates. If the anti-spam filters could have handled the image spam, that would have been great, but image spam makes it through more than other spam.

The second thing is image spam is much larger in size than regular text spam. They are five to eight times bigger which means that all the resources (bandwidth, disc space, archival) is much bigger. If you just take 2006 where 25% of spam was image, and because there was also more spam, then just during 2006 the demand for bandwidth and disc space used by spam during the year was three-fold because of image spam.

How do most companies stop image spam?

The most natural solution would be the OCR, or optical character recognition, which is trying to take image spam and convert it back to text because most of the filters know how to handle the text. The problem with using OCR is that it needs a lot of resources. If you take an ISP that gets hundred of thousands of messages a second, it is very hard for a OCR engine to handle images that fast so a lot of the OCR solutions are not good enough for ISPs or large organizations.

The second thing is that spammers are smart enough to know that anti-spam vendors would use OCR so most of the tricks they use today are in order to fool OCR engines as well. So OCR is the most natural solution but a very problematic one.

Some other engines or solutions try to work around it like adding a heuristic saying if you get a message from someone you don’t know and it contains an image then block it, but of course this would result in a lot of false positives.

How does Commtouch detect image spam?

Commtouch works in a different way. We have no content analysis in any way and for us image spam is just like any spam. The way Commtouch detects spam is by looking at the fact that all spam, by definition, is massive, it comes in bulk, and we’ve found a way to look at the traffic on the internet and find recurring patterns of messages on the internet. It’s kind of a mathematical, algorhythmic way that finds patterns of messages that repeat themselves on the net and this specific pattern is used to block next messages.

We don’t care what the content of the message is, it can be an image, it can be textual, it can be any language, it can be in Chinese or Japanese, as long as it repeats itself we can find the pattern and block it.

Spammers have proven to be incredibly adaptable. What’s the next trick you think spammers are going to use?

We can look at two things. First of all the format of the messages as well as the distribution methods they are going to use. The main way to distribute spam today is with botnets, which are innocent infected computers, usually home-users, and spam is being sent through the botnets, and those computers do not belong to the spammers so in fact their resources are unlimited. They don’t care about the CPU, the bandwidth, as they are using somebody else’s resources. So this allows them to send more image spam and I believe if they want to move a step forward and do audio-spam or video-spam that will probably generate a higher click-through rate which is important for them as well.

So I think using more formats of spam is something that we will see soon. But the other thing relates more to distribution: one of the biggest trends in anti-spam today is sender authentication. There are many efforts on the internet to stop messages if they do not come from the person that says they sent the message. So if sender authentication efforts continue to be effective, spammers will have a harder time sending random addresses than they do today.

And I believe that one of the things that they will do is that the infected machines will start sending messages that look like they are from the owner of the machine. So it might be spam injected into person-to-person messages or spam injected into valid newsletters that will come from the person who should have sent it but will now come with added spam. This is much like viruses used to be distributed before.

It is only business for them today, and if there’s business they have enough money to invest in new technologies. And one more thing we’ve already started seeing are the blended threats. Spam is not a stand-alone today, it’s being sent by the same people who send out viruses. They can use the same zombie machine to do distributed denial-of-service attacks, etc.

Posted by pschooff in |Digg This|Add to del.icio.us

Trackback Pings

TrackBack URL for this entry:
http://www.ebizq.net/mt/mt-tb.cgi/1631

Comments

We never can stop spam, we can only increase the completxity of the product so that spammers don't know how to break it.

Posted by: Jay at March 30, 2007 03:09 PM

I am curious, in looking at all the traffic on the internet, and then using recurring patterns of messages in order to stop spam, how do you tell the difference between certain heavily forwarded emails and spam mails?

Paul

Posted by: Paul at March 30, 2007 03:24 PM

Interesting point, you should ask this question to a vendor.

Posted by: Jay at March 30, 2007 03:46 PM

From my own understanding, which does not go as deep as Commtouch's, I would assume that forwarded messages, while showing up as an email pattern, would still be far less than actual spam, which often randomly emails addresses in hopes of finding a real one, so real spam mail would still dwarf forwarded emails. That's to the best of my knowledge, but I have still forwarded the question on to Commtouch to see what their response is.

Best,

Peter

Posted by: Peter at March 30, 2007 03:55 PM

In answer to Paul's question, an email would have to be forwarded in significant quantities in order to enter the realm of spam. However quantity is not the only difference between forwarded legitimate messages and spam; the easiest way to tell them apart these days is to look at the distribution patterns. Spam is typically sent from zombie PCs, so Commtouch would see recurring patterns originating from thousands of different locations across the Internet. A legitimate email message forwarded even thousands of times would typically come from the same IP address or range of IP addresses.

Posted by: Rebecca Herson, Sr. Director of Marketing, Commtouch at April 1, 2007 03:18 AM

Post a comment




Remember Me?

(you may use HTML tags for style)

We ask that you type your code (displayed below) in the text box.This code is an image that cannot be read by a machine. It prevents automated programs from submitting comments.


Code:



Most Recent ebizQ Blog Entries
ADVERTISEMENT
Subscribe
News Feed
Blog Roll
Blogosphere
This Work
Accountability:The opinions expressed in this blog are solely representative of the blog's author, and not of ebizQ

Subscribe to our Newsletters
ebizQ Weekly Gold Club Update
Live Webinar Updates
Updates from ebizQ Partners
ebizQ SOA Update
ebizQ BPM Update
ebizQ Security Update
ebizQ BI Update
ebizQ Open Source Software Update
Virtual Show Newsletter
Your E-mail Address:
BAM: The Killer App for CEP
Date: Feb 12, 2008
Time: 12:00 PM ET
(17:00 GMT)

I WANT TO ATTEND
Event Processing Market Pulse
Date: Feb 14, 2008
Time: 12:00 PM ET
(17:00 GMT)

I WANT TO ATTEND
Archived Webinars | Upcoming Webinars

Marketing Solutions | Feedback | About ebizQ | Unsubscribe | Privacy Policy | Site Map