Slugging it out with SEO

Iím trying to start of the new year by updating all my web sites…

Iím trying to start of the new year by updating all my web sites. Iíve been researching SEO (search engine optimization) for a few months, but only recently started to work through how to actually do it. The last few days have been a eureka moment for me.

The first problem I had to deal with was how to hide the index script. Generally if you are using Apache you can do this with an .htaccess file. For my CMS, this directive worked reasonable well:
DirectoryIndex cgi-bin/index.cgi
Unfortunately, it isnít perfect and I had to do a lot of hacking in the WebAPP code to get to work. I really want to clean up the CMS code anyway, so I started to rewrite it using my test site. But I needed a template to start from. Fortunately, Iíd also been playing around with WordPress to see how it deals with slugs.

In researching and viewing how WordPress does things, I ran across this great article. It walks through each line of WordPressí .htaccess file, which wasnít too hard to understand. The real gem, though, was the last line that discussed how it read the slug. WordPress uses the REQUEST_URI variable instead of the normal QUERY_STRING.

With this missing piece of the puzzle now in place, I was able to actually to modify my .htaccess file. Obviously I needed to use CGI instead of PHP, since my CMS is Perl based. But outside of that, it is identical to WordPressís.

The real magic then happened in the script itself. Using the REQUEST_URI was pretty easy, once I figured out a few bugs in the CMS. As an example, here is some simplistic router code. It isnít the cleanest yet, but it works.

# Get slug
$raction = $ENV{REQUEST_URI} || '/'; # get slug
$raction =~ s/\/+/\//g; # reduce all slashes to a single one
# Display default page
if ($raction eq '/') {
# Display news article
elsif ($raction =~ /^\/(\d+)/) {
  $info{'id'} = $1; # pull article ID from slug
  require "$sourcedir/"; # grab article functions

Note that the slug right now is just the article ID as that is how the existing code finds the article. It isnít the most friendly, but it does work. In fact, I could leave it if I really wanted to. I havenít decided yet how to actually define the slugs. WordPress, by default, uses a date with title system like this: /2016/01/01/slugging-it-out/. Some articles say donít do the date thing and just use: /slugging-it-out/.

Iíve seen other sites that leave the ID and add the title to end like so: /244/slugging-it-out/. I discovered that Google uses a hash after the ID on some of its pages to make it obvious: /244#slugging-it-out/. In fact, my code as written above will allow either of these options to work. The web seems divided on if either of these are good ideas so I may just need to make a choice and move on.

And there you have it. Iíve now built about a dozen slug routes on my test site. Things seem to work pretty well, but I had to hack a lot of my internal code to ensure they print the slugs and not the query strings. For now, Iíve also set up my router to allow either format. In fact, for some internal facing routines I may not even bother. If WordPress doesnít, why should I?

Comments on this article:

No comments so far.

Write a comment:

Type The Letters You See.
[captcha image][captcha image][captcha image][captcha image][captcha image][captcha image]
not case sensitive