Ways to Scale Content Creation

Posted by IrishWonder on November 3, 2012

Content. The keystone of post-Panda SEO. Your thin affiliate sites are not good enough any more, nor is your contentless link spam. You hear every newbie and seasoned SEO alike saying it on every corner: C-O-N-T-E-N-T.

(Photo credit: Colin Angus Mackay on Flickr)

And you subdue. You switch to content-based ~~link spam~~ linkbuilding. You are sure it will work better for you, whatever it takes. Google will love you more. Traffic will skyrocket. So will your commissions. But wait, where will you get all this content?

(Photo credit: calex5 on Flickr)

There are ways of course. You’ve always known it. Firstly, you can always scrape something. You didn’t build all those scraper scripts just to sit there rotting, after all. So you recollect your 2005 experiences, fire up your scrapers and…. find out that Panda doesn’t like you! OK, that was just a SERP scraping experiment, surely you ended up with a bunch of duplicate content (although in some cases, due to Google’s funny ways of determining the original source, you could have been lucky enough to kick the real sites out of index – but that’s a separate story). A better approach would be of course to scrape some content closed from indexing by Google. Still doesn’t save you from a DMCA complaint should the content owner ever find out though.

(As a sidenote, I found out that the scrapers I’ve built around 2006 are now largely unique content. (Yes some of them are still alive – talk about blackhat being a short term strategy!) How come? – Easily, the sources I scraped were just the SERPs, and out of those SERPs little remained intact to this day – some sites have changed the content, some simply ceased to exist… Given my setup where scraped SERPs did not look like scraped SERPs, my old scrapers can now almost pass a human review. But that’s just a sidenote. If you’ve got a time machine, take this for an actionable tip and use it. Otherwise, move along, nothing to see here.)

To lower the risks and improve the results, you turn to other methods. Let’s say you try automated translating to hide your scraped content. There are three things to keep in mind, however. First: machine translation is very far from perfect, any linguist will tell you that, and translated content is even less likely to pass human review than scraped SERPs in most cases. Second: due to specifics of different languages, the results of your translation may be slightly off topic from what you initially intended them to be (theoretically, it can be fixed using something like nTopic, if you figure out how to incorporate their API into your setup). Third and the scariest: rumours have it that Google actually caches every bit of text translated by Google Translate to later identify it being used to spam the SERPs! Surely there is also Babylon so this obstacle can be bypassed – but see the first caveat again. Unless you manually proofread the resulting content (making it insanely expensive if done on a large scale), you are still in trouble.

Next option: take your content you got no matter where and Markov it. Markov chain is a process that can be used to uniquify texts. However, again, the resulting texts would probably not pass a human review, even though there have been anecdotal cases of autogenerated papers getting submitted and accepted for conferences.

What have we got next? PLR articles.. While they may seem as an easier alternative to scraping or autogenerating content, that is of course not unique content by definition and needs to be tampered with in some way before it can be used safely.

(Photo credit: le cabri on Flickr)

Talking of ways to tamper with non-unique content to make it unique, how can we skip mentioning spinning? After all the bashing it got from Google, people seem to be more sceptical about spinning content, but let’s put our records straight. The showcased example was a result of automated low quality spinning process, hence the result that barely has any meaning and looks unreadable to a human. The spinnable matrix for that example probably took less than 5 minutes to prepare and no human ever looked at it. That is not how you achieve real results with spun content. A proper matrix for spinning takes hours, if not days, to prepare, requires a good amount of previous experience and understanding of what you are doing, the mechanisms of the process and at least a few of possible versions of the outcome. Creating and editing spintax is a very brain intensive process and can easily drive you mad. “I think in spintax” is something you hardly want to tell your psychiatrist as to him, this would be more of a diagnosis than merely bragging rights. Is proper spintax effective in terms of generating multiple readable, unique copies of a text? – you bet it is. But is it an easy or a cheap solution? – hardly so!

Finally, if all else fails there are some new fancy ways of scraping called syndication, or aggregation, or content curation. Syndication is usually done using RSS feeds (actually this is something many of us have been doing since 2004 and slapping AdSense over syndicated content). Aggregation is pretty much yet another name for the same process. There is probably little surprise that Google is the leader when it comes to syndication/aggregation. A good example of syndicated content would be Google Finance. Their new Knowledge Base is also ~~shameless third party scraping maximized~~ an example of aggregating third party content.

Content curation is a little bit different. While syndicated or aggregated content can of course be arranged in a manner where the whole output setup provides an extra value to the viewer, content curation is about adding a personal outtake – here is an example from the master of content curation (warning: better open in a new tab because you are not likely to get back from there any time soon). Strictly speaking, curation properly done is not as much scraping but more of co-creation. While it can be scalable, it does require human input, attention, understanding of the subject you are curating and some degree of talent to come up with something impressive. If impressive is not your goal, you can still go this way but keep in mind that there is a thin line between aggregation and curation, on one hand, and scraping and duplicate content, on the other. Where this line lies is subject to Google’s decision but surely nobody stops you from taking your chances.

All that said, you might be already aware that there is a new way of scaling content creation just around the corner that should be available soon (Mr. Fantomaster and I are working hard on getting it ready for the launch). If you were not aware yet, there I told you now. Now you can go sign up and wait impatiently for the launch notification. Trust me, it will be worth the wait.

Black Hat

← My Reply to Google’s Updates

How to Get Your Site Banned in 3 Easy Steps →

Irishwonder’s Black Hat SEO Blog

A blog about blackhat, general SEO issues and other things related to the life on the web