Combat Email Harvester Robots with Our
ISO, Hex or Mixed Code Email Obfuscator
In the same way that search engine spiders crawl the web grabbing site pages, examining and filing them in huge databases ready for retrieval by surfers using Google, Yahoo! or whatever web search engines, email harvester spiders traverse the Web looking for email signatures in web pages — for the (often sole) purpose of building large databases to sell on as spam email address lists. Here are a few methods of email obfuscation.
If you consider these obfuscators useful, throwing a link to this page would be much appreciated.
Using the robots.txt File
Good, legitimate, search engines will observe certain rules their spiders find in a file called ‘robots.txt’, placed in the root directory of a site, which will contain certain instructions about where a spider can and cannot go within the site structure, and which pages (and directories of pages and files) they should and should not retrieve for indexing.
For instance —
says that all search engine spiders are welcome [ User-agent: * ] but they don’t need to visit certain directories [ Disallow: /cgi-bin/ ] (and pages therein) and neither should they retrieve the contents of one page in the root [ Disallow: email_ty.htm ].
Robot Meta Statements
You can also achieve similar by using meta headers like —
<meta name=”robots” content=”index,follow”>
<meta name=”robots” content=”noindex,follow”>
<meta name=”robots” content=”index,nofollow”>
<meta name=”robots” content=”noindex,nofollow”>
illustrating permutations of the theme but nowadays it’s better practice to adopt the robots.txt standard (there’s some debate as to how long the meta directives will be honoured or if indeed they currently are by many legit spiders).
Dealing with Rogue Email Harvesters
Rogue bots simply ignore the in-/exclusion directives because they want to examine all site pages for email addresses.
var email = “enquiries”
var domain = “seowebsitepromotion.com”
document.write(“<a href=” + “mail” + “to:” + email + “@” + domain + “?subject=General%20Enquiry” + “>” + email + “@” + domain + “</a>”)
Email Obfuscation Methods
Another method is to obfuscate the address using a combination of Hexadecimal and/or ISO characters in place of the letters which make up an email address.
This is what the second form on the page does. Drop in your email address then select ISO, Hex or Mixed output to produce an email address which will remain transparent to the majority of email harvester robots. The top form is similar but it generates a complete MailTo email address with the option to change the on-screen link, mouse-over title and email subject title.
A six month study found that email addresses encoded in this way and posted on the Web received no junk email.
Note: Email addresses are not stored, they’re simply processed for you. We don’t spam.
Spoofing Email Addresses to Clobber Harvesters
As with all such pages, it’s wise to ensure you include a robots exclusion meta header – <meta name=”robots” content=”noindex,nofollow”> – and a similar exclusion line in the robots.txt file, Disallow: strap.asp to avoid indexing the page in legitimate engines who follow exclusion directives. We don’t want to populate legitimate search engine databases with junk.
Of course, the page could be modified to self-reference itself as a new URL to trap email harvesters in a tar pit, ad infinitum …
Thwarting Form Submission Bots with Hidden Form Fields
For some time rogue submission bots have trawled the Web seeking out unprotected forms. This has escalated enormously with the blogging phenomenon, with personal and professional sites offering (often unmonitored and automated) user feedback dialogue. Submission bots simply populate form fields with their masters’ junk and trigger the submit button.
Of course, it’s possible to program a spam bot to ignore hidden fields and the submission would get through, so a potential fail-safe might be employed in your detection script. Bots often populate fields with the target’s domain name in an effort to legitimize the post. Enigma‘s domain is www.seowebsitepromotion.com, and a bot would strip the ‘www.’ or ‘http://www.’ URL prefix, add a junk name and ‘@’ to simulate a valid email address then use that spoofed address to populate some of the form’s fields.
You can no doubt see where I’m going: parse all non-email fields for your domain name, like ‘seowebsitepromotion’, ignoring the URL prefix and TLD (Top Level Domain)) suffix, ‘.com’ – and away you go…
Arguably the more resilient defence against automated submission bots, Completely Automated PublicTuring test[s] to tell Computers and Humans Apart – CAPTCHAs – demand active participation by human site visitors to check form validation.
In its simplest form a CAPTCHA is a distorted graphical image of a word or jumbled sequence of text and numbers generated programmatically. As illustrated, it may also contain obscuring lines or an overlayed grid to further obscure the image. Simply changing text to its graphical representation will not do since this is easily resolved by software, as evidenced by the number of online font readers which can readily interpret graphically-embedded fonts and supply the typeface.
The visitor is then invited to enter the word or character sequence into a concluding form field as proof (s)he can interpret the image and is therefore human.
All fine and dandy—provided the visitor is not visually impaired; non-sighted users have no chance. Therefore an alternative might be offered such as audio output or a text based equivalent based on an intrinsic element of a statement or an answer to a simple question like “How many wheels does a car have?” While text based CAPTCHAs do not conform to the spirit of CAPTCHA, in that they should be programmatically generated (and can be cracked using artificial intelligence), nevertheless they represent another relatively effective piece of armour against submission bots.
For further information, practical examples and code, check out www.captcha.net
Combating Email Spam
Once you’re on a spam database there’s little you can do about it other than to change the exposed email addresses. This is more easily said than done as some may be long-standing and obscurely originated, requiring considerable time to detect not necessarily valid originators but the inevitable accompanying authorisation passwords for newsletter or subscription accounts which will require updating to new email addresses.
And, unless you maintain a form-only level of email dialogue, there’s every likelihood you’ll be compromised again in the not-too-distant future, especially if you run a popular web presence or feature well in the search engines.
Many people make the mistake of using a catch-all address for their emails, something like email@example.com, and while this may be handy for receiving all mail addressed to the domain it also relays all spam, since any addressee name is passed on. Make addresses specific and limited in number.
Far more insidious are the techniques used by spammers to verify a valid email address. Once an email has downloaded into your Outlook, Mozilla or other local inbox, especially mail with graphical or embedded elements, it is child’s play for the spammer to originate your IP address and then use matching software to interrogate a database and identify the recipient’s likely name and other details, enabling them to detect your geographical location and other demographic information, and thus penetrate and encourage you to lower your defences by using words which appear pertinent, valid or attractive to you.
The trick is not to let them get that far. Turn away spam at the host server before your IP address is compromised. Many hosting companies offer this as an inbuilt feature, either killing known spam or flagging potential spam for you. But this can be dangerous, possibly generating false positives and deleting mail from unverified, new addresses which may originate from prospective clients. I disable this feature on my POP3 accounts – I want new business! – which leaves the onus on me to filter my remote inbox.
There are a number of professional software applications which intercept mail at the server and offer a variety of features to thwart spam, including bouncing spam back to the originator, effectively indicating an invalid address. I have used MailWasher Pro for over a year and find it effective and efficient at identifying and managing spam emails. There are some free spam washers out there – I once naively used one which hooked into my local client – but they don’t obviate the challenges mentioned above of IP address grabbing, being local email client filters.
What with spam email, harvester and submission bots, phishing, etc., the level of online time-wasting and threat to your money is ever increasing. Even using obfuscation, honey traps, mail washers and other tricks, traps and deterrents, I still receive thousands of spam mails every week and waste at least ten hours each week eyeballing my front-guard server inbox; I might be fastidious about hiding my online addresses but others who contact me are either not so vigilant or unwittingly harbour trojans on their systems.
Perhaps one way to deal with this and offer an effective deterrent to this criminal activity might be as follows:
- Find a wall
- Find a cigarette
- Find a blindfold
- Wait for a rainy day
- Light cigarette, place in spammer’s mouth, position spammer against wall…
Have fun, and let’s kill spam(mers)!
For more information about our Minneapolis SEO Company