Search Engine Spamming Techniques
The benefits of using cascading style sheets (CSS) to separate content from form are manifold, not least the ability to adjust page headings H1 - H6 to discrete sizes and maintain general typographical control but also to maintain the weight and proportion, alignment and visibility or elements on the page. This frees the developer to concentrate on page content, the copy or text that forms the intrinsic value of the page.
Vast Engine Databases
And it is the copy which is greedily consumed by search engine spiders employed by Google, Yahoo!, MSN, AllTheWeb, Lycos, Ask Jeeves, Alta Vista and myriad wannabe engines plundering the Web for your words, which are then indexed and catalogued in vast databases collating literally tens of billions of pages. At the last count Google alone was said to be indexing 8 billion pages.
But for what purpose? Disregarding the philanthropic efforts of directories like the ODP (Open Directory Project or DMOZ which doesn't employ crawlers in the conventional sense but uses them to validate entries for inclusion in their human edited database) and scientific semantic web analysis, the answer is simple: to make money.
Pay Per Click Placement
Cataloging web content has become big business, recently evidenced by Google's public stock offering. Quite apart from attracting surfers to the portals surrounding these search engines, the delivery of search results or more specifically, the sponsorship of promotional results in paid placement - for want of a truthful word - advertisements generates billions of dollars annually for the key players from SEMs (search engine marketers) promoting their clients' products, services or, in some instances, political views. In short, businesses are willing to pay to be seen, in much the same way they do on commercial television.
Today, the most common format is PPC (pay per click). Advertisers pay a variable fee to artificially inflate their presence in search engines levied whenever a user clicks on their web page link. A web page is given a highly visible priority listing. This means a site or page will appear within the first tens of SERPs (search engine results pages) delivered when a user queries an engine for a search phrase and the more a promoter pays, the higher up the listings they will appear.
But this can sometimes costs thousand of pounds a week for hotly contested phrases in lucrative coveted commercial markets. In fact, I have seen some phrases in the telecommunications industry go as high as £11 per click. Yes, the sponsor is charged £11 when a user clicks on their link.
Spamdexing: Search Engine Spamming
Which is why so many SEOs (search engine optimisers) are tempted to spam the engines, either for themselves or their clients. It even has a name: spamdexing.
There are numerous techniques for artificially inflating a website's position (relevancy to search phrase) within search engines. Google's preoccupation with its proprietary PageRank (PR) system (a rating of site 'worth' as a geometrically rated value from 0 to 10 with a bias towards authority sites based on the number of links pointing to them) drives many unethical SEOs and marketers to use underhand methods such as link farming[1], junk forums[2], domain duplication[3], empire building[4], linked blogs[5] or wikis[6] to reinforce PR and then incorporate JavaScript[7] for hidden redirected link manipulation. These, along with dynamically spawned pages[8], doorway pages[9], ODP-driven databases[10], keyword packed machine generated pages[11], full stop or single pixel outbound hidden links[12] to keyword stuffed anchor text[13] linked inbound pages, cloaked redirects[14] serving completely different pages to the engines than those reserved for surfers ... the list goes on, these are all available to the unscrupulous developer and have been used for years.
CSS and <noscript> Tags
But this article deals with other methods: use of CSS and <noscript> tags, techniques specifically used within the page to generate higher search engine ranking.
There's a well known phrase amongst the SEO fraternity: Copy is King, meaning that all other search engine placement techniques being equal, the text content of the page will ultimately determine how well the page ranks. Well written copy supported by judicious use of keyphrases in title, description and headings then thoughtfully applied within the page body is fundamental to achieving strong SERPs position.
Keyword Density
It is generally recommended 200 - 250 words with 7% keyword density will grab the algorithms' attention and elevate the page from the mires of obscurity into the limelight. That's a fair bit of copy and can make for a visually bulky page once headings and graphics are thrown in. Many companies like to cut down on the text and skip to the meat, like specials offers illustrating products or attention-grabbing headlines occupying much of the page by their size and intensity, often demoting or removing decent copy. Unless such a page has a strong PR with multiple inbound links it is unlikely to feature well in the engines. So, to compensate, unscrupulous SEMs and webmasters undertake a little behind the scenes work to spice up the offering.
Classic Spamming Methods
WOW (white on white) text, as I like to call it, is a one of the simplest methods and was prevalent until the engines cottoned on. Markup in the page describes the fore- and background colours of elements using hexadecimal codes (although contemporary development frowns on this method there are billions of pages out there using this method and still more being added by uneducated webmasters). Make both codes identical and you have the same fore- and background colours and the text effectively disappears (although a quick Alt-A will reveal it). It didn't take much figuring for the engines to look for such matching, cry spam and penalise or permanently remove the page or, indeed, site from their databases.
Other methods were required, such as making the text miniscule, 1 pixel in size. A heck of a lot can be squeezed into lines interspersed within the page body. This method was quickly detected and another rule applied to the engines' spam algos that flagged text below a certain point, em or, more usually, px (pixel) size.
Once these techniques were uncovered it became necessary to adopt a more thoughtful approach: hide the text under an image by offsetting it from the normal flow or remove it entirely from the page using absolute positioning. This way the above methods are not needed and, to all intents and purposes, it's valid copy and unlikely to incur penalty.
Spam with CSS
Enter CSS (cascading style sheets) and separation of content from form, the benefits of which are mentioned at the beginning of the article. The downside, as far as spam is concerned, is using CSS makes it incredibly easy to once more spam the engines, and what makes it worse is all the old and once redundant methods can be redeployed. And because Google and other search engine robots do not request CSS files there is no way to algorithmically detect CSS spamming. They don't have spam algorithms in place to deal with it.
Using CSS it is possible to keyword stuff any block of text and have it vanish from the visual page using a number of different techniques, all of which are entirely legitimate in other valid circumstances such as accessible, tabbable links, expandable menus and text summaries — creative development markup used to enhance user experience by accessible and usability developers.
Because the rules that govern the display of markup elements have been removed from the page there is no current way such spamming techniques can be detected automatically because no engine to my knowledge reads and factors CSS scripts into their spam algorithms. It's once again open season.
Alert Search Engines
The only way to stop this is to alert search engines, although how much good this will do is open to debate. I have emailed Google at their spam reporting address innumerably over a 3-year period and have now decided it's a waste of time. The sites to which I alerted them months, even years ago remain where they were.
Hard Evidence
Now, a little hard evidence from Google UK. Google for 'website development' under the UK toggle. Examine the first few pages. To make it easy, disable stylesheets in your browser. See what I mean. Now examine the associated stylesheets stored in your browser's cache and learn how to spam all over again.
Finally, a word on the <noscript> tag. This is legitimately used to describe content to those browsers or people who can't see or hear embedded multimedia or have JavaScript disabled. In JavaScript, a dropdown menu would sensibly be supplemented by an inline alternative menu which displays when scripts are disabled. In a Flash animation the content might be described in the <noscript> tag, especially if an audio narrative is used, when the text of which would be transcribed. Instead, spammers see this as an opportunity to spam with keyword-rich text, often irrelevant to the image or animation on display. It's pure spam.
A Criminal Act
What makes this even more criminal is surfers with physical and/or cognitive impairments are subject to this tirade of nonsense. Their disabilities mean they use assistive technologies (AT) to view web content; they 'see' all the junk and are often confused by the nonsensical keyword stuffed phrases or mislead as to site content.
Businesses which employ such techniques on their websites are doing an unforgivable disservice to visually impaired users who already take 8 - 10 times longer to interpret site content than those without disabilities. Search engine spamming is not only unethical, it is immoral. And some might argue criminal, since such practices actively discriminates against impaired users via Part 3 of the Disabilities Discrimination Act (DDA).
Glossary of Spamming Techniques
- 1 Link Farming
- Swapping, exchanging or selling reciprocal links in order to boost the number of sites linking to a particular site in order to increase SERPs visibility.
- 2 Junk Forums
- Discussion forums developed specifically to promote discussion or interest in certain topics in the hope of seeding pages with keywords and phrases which will be spidered by search engines.
- 3 Domain Duplication
- The development of (almost) identical websites with different domain names, often modified to suite regional specificity or targeted keyphrases.
- 4 Empire Building
- The process of building interlinked websites and/or web logs (blogs) in order to mutually enhance each site's SERPs.
- 5 Linked Blogs
- Similar to Empire Building [4, above] but usually more refined to promote a group of similar thinking individuals. Can be used to influence site ranking with the introduction of specific domain (site) references.
- 6 Wikis
- Much like Linked Blogs [5, above] except far more crude in operation and unlikely to be afforded authority status like 'professional' blogs.
- 7 JavaScript Link Manipulation
- A technique used to redirect web visitors to specific pages, often forcibly through mouse-over triggers, and difficult to detect automatically.
- 8 Dynamically Spawned Pages
- Pages produced by server-side code like ASP or PHP served in response to specific, highly targeted keywords, dynamically generated from template text, often served to web crawlers.
- 9 Doorway Pages
- Keyword rich pages specifically optimised to perform well in the SERPs which add nothing to overall site content and whose links invariably point to the index or other targeted website page.
- 10 ODP-Driven Databases
- Categorised data dumped from the freely available ODP RTF (Rich Text Format) to supply keyphrase-rich doorway pages supporting a core site.
- 11 Generated Pages
- As ODP-Driven Databases [10, above] but machine generated from similar database content.
- Used to increase PR (PageRank) and SERPs visibility of a site or page by providing link content for crawlers generally to doorway pages and concealed from web visitors.
- 13 Keyword Stuffed Anchor Text
- Links to other pages comprising targeted keyphrases, often irrelevant to surrounding text.
- 14 Cloaked Redirects
- Serving one set of pages to spiders, another to users.
