Slugging it out with SEO


My attempt at improving the search results for my website using slugs.

I’m trying to start of the new year by updating all my web sites. I’ve been researching SEO (search engine optimization) for a few months, but only recently started to work through how to actually do it. The last few days have been a eureka moment for me.

The first problem I had to deal with was how to hide the index script. Generally if you are using Apache you can do this with an .htaccess file. For my CMS, this directive worked reasonable well:
DirectoryIndex cgi-bin/index.cgi
Unfortunately, it isn’t perfect and I had to do a lot of hacking in the WebAPP code to get to work. I really want to clean up the CMS code anyway, so I started to rewrite it using my test site. But I needed a template to start from. Fortunately, I’d also been playing around with WordPress to see how it deals with slugs.

In researching and viewing how WordPress does things, I ran across this great article. It walks through each line of WordPress’ .htaccess file, which wasn’t too hard to understand. The real gem, though, was the last line that discussed how it read the slug. WordPress uses the REQUEST_URI variable instead of the normal QUERY_STRING.

With this missing piece of the puzzle now in place, I was able to actually to modify my .htaccess file. Obviously I needed to use CGI instead of PHP, since my CMS is Perl based. But outside of that, it is identical to WordPress’s.

The real magic then happened in the script itself. Using the REQUEST_URI was pretty easy, once I figured out a few bugs in the CMS. As an example, here is some simplistic router code. It isn’t the cleanest yet, but it works.

# Get slug
$raction = $ENV{REQUEST_URI} <pipe><pipe> '/'; # get slug
$raction =~ s/\/+/\//g; # reduce all slashes to a single one
# Display default page
if ($raction eq '/') {
  print_main();
}
# Display news article
elsif ($raction =~ /^\/(\d+)/) {
  $info{'id'} = $1; # pull article ID from slug
  require "$sourcedir/topics.pl"; # grab article functions
  viewnews();
}

Note that the slug right now is just the article ID as that is how the existing code finds the article. It isn’t the most friendly, but it does work. In fact, I could leave it if I really wanted to. I haven’t decided yet how to actually define the slugs. WordPress, by default, uses a date with title system like this: /2016/01/01/slugging-it-out/. Some articles say don’t do the date thing and just use: /slugging-it-out/.

I’ve seen other sites that leave the ID and add the title to end like so: /244/slugging-it-out/. I discovered that Google uses a hash after the ID on some of its pages to make it obvious: /244#slugging-it-out/. In fact, my code as written above will allow either of these options to work. The web seems divided on if either of these are good ideas so I may just need to make a choice and move on.

And there you have it. I’ve now built about a dozen slug routes on my test site. Things seem to work pretty well, but I had to hack a lot of my internal code to ensure they print the slugs and not the query strings. For now, I’ve also set up my router to allow either format. In fact, for some internal facing routines I may not even bother. If WordPress doesn’t, why should I?



Comments on this article:

No comments so far.

Write a comment:

Type The Letters You See.
[captcha image][captcha image][captcha image][captcha image][captcha image][captcha image]
not case sensitive