Permalink Problems

It’s generally accepted that it’s not a good idea to use the default URLs that WordPress provides. They offer no clue, either to the human visitor or to search engines, as to what a post is about, and I generally use the permalink scheme that addresses posts and pages using a string based on their title. However, after transferring a site to a new server recently I found I was getting 404 ‘not found’ errors on every post!

This has happened to me before, but I can never quite remember what I had to do to fix it, so I thought it might be sensible this time to document it while I still remember! I’m going to spell it out in fairly fine detail. This is not intended to insult your intelligence, but rather to minimise the possibility for confusion or uncertainty. Feel free to skim the boringly obvious bits!

You also need to know that I’m assuming command line access on the server, which almost certainly means via ssh. If you’re using Plesk or cPanel none of this applies, because they are set up already. If permalinks don’t work and you’re on Plesk or cPanel you should get on to your hosting provider’s technical support and ask them to fix it!

Caution I use Linux (Ubuntu, to be exact) and Apache, exclusively. If you use nginX, or IIS on Windows, or yet another webserver and/or operating system the procedure will be different. There are differences between Ubuntu (or Debian) and other Linux distributions, but they are slight, and I’ll try to point them out as we go.

How Permalinks Work

It will help if we understand how permalinks work. When WordPress refers to a page or post using its id (e.g. http://example.com/?page_id=2345) what it uses for the id is its index in the WordPress post database. But strictly,

http://example.com/?page_id=2345

is not a conventional URL at all. The ‘?page_id=2345’ part (known as a ‘query string’) is an instruction to WordPress to retrieve post number 2345 from the database and display it.

Now suppose what it retrieved from post number 2345 was a list of cars. In this case a more conventional URL structure might then be something along the lines of

http://example.com/vehicles/cars

which models a situation where the server at the domain example.com opens up its vehicles folder, finds the cars folder within it, and displays an index to that folder’s contents.

Websites used to work exactly this way, but nowadays it is often a convenient fiction because systems like WordPress prefer to organise things differently — using a database rather than the file-system, for example. Nevertheless it is a good way to organise things conceptually, and has distinct SEO benefits.

Permalinks are WordPress’s solution. They present a nicely organised logical structure to the world, while actually organising things quite differently behind the scenes.

So what happens behind the scenes?

Clearly, ‘http://example.com/vehicles/cars’ has to be translated into something that WordPress can recognise as a request for whatever it has stored in record 2345 in its database, which means WordPress must maintain an additional index that maps ‘/vehicles/cars’ onto ‘post number 2345’.

This is not the whole solution though. When the webserver (Apache) sees ‘/vehicles/cars’ it will look for a directory called ‘vehicles’ and within it a file or further directory called ‘cars’, neither of which exists! Unless we can persuade it to accept this string and pass it on to WordPress to sort out, it will respond by reporting error 404 (not found).

One way to do this is via a file called ‘.htaccess’ (usually pronounced ‘dot HT access’). Before accessing files in any given directory the Apache webserver will look for a .htaccess file in that directory (or a higher one) and if it finds one it will read it and obey the instructions it contains. By adding suitable code to this file WordPress is able to ensure that rather than rejecting the permalinks, the webserver will simply pass them through to WordPress for processing.

Note The dot at the start of the file name marks it out as ‘hidden’, so if you use the unix command ls, to list files, it will not be shown. This gives some basic protection against accidental damage. In order to include hidden files you must use the command ls -a, for ‘list all’.

Building-Blocks

So now we have the basic building-blocks for the permalinks system. If yours is a default installation on cPanel or Plesk everything should be set up to make the system work. All you have to do is tell WordPress what sort of permalinks you want it to use, and it will quietly put the necessary instructions into a .htaccess file and build the required index, after which everything should work ‘automagically’.

But if you are using a ‘bare metal’ server and setting everything up via ssh it can be another story! Linux being, essentially, Unix, there are options everywhere, and unless some of these are set up correctly your permalinks will just not work.

What can possibly go wrong?

Probably the commonest problem is incorrect file permissions. WordPress needs to be able to create a .htaccess file (if there isn’t one) and to write to it (if there is). The .htaccess file will normally be placed in the ‘document root’ directory for the domain, so the first thing you need to do is to find this directory. If you know where it is, well and good — you can skip the next bit. It’s not difficult to find anyway, as it’s generally the directory containing your wp-content directory. However if your installation is accessed via an intermediate directory (e.g. ‘blog’ or ‘wordpress’) it will be that directory.

Of course, if you can find an existing .htaccess file the odds are good that it’s in the right directory.  However, if you are in any doubt here’s how to find the correct answer.

Finding the Document Root Directory

Note If you are using a non-Debian distribution the sites-enabled directory mentioned in this section will not exist, and you will need to find the corresponding file for your distribution. Try things like apache2.conf. httpd.conf, httpd.include. There are numerous possibilities depending on the age and type of your distribution — the files are not even always in /etc. Good luck!

The first place to look for information is in your /etc/apache2/sites-enabled directory — do something like

~$ cd /etc/apache2
/etc/apache2$ ls sites-enabled
000-default.conf  example1.com.conf  example2.com.conf
/etc/apache2$

Your domain should be listed here, usually but not quite always (this is Linux, after all), with ‘.conf’ appended. If it is, open the file (with less, more, nano, or your favourite editor) and find the line that starts ‘DocumentRoot’. If it isn’t listed then open 000-default.conf and find the ‘DocumentRoot’ line in that. In this case do make sure you use the correct DocumentRoot: if several domains are listed in the same file there may be a separate DocumentRoot for each, so check carefully!

Either way it should look something like this:

DocumentRoot /var/www/vhosts/example1.com/httpdocs

If you have only a single site on your server it might be as simple as

DocumentRoot /var/www/html

All sorts of variations are possible, but whatever directory is listed as the document root will define where our .htaccess file should be placed, as it’s where the webserver will look for instructions relating to the domain.

Now go to the document root directory and check ownership and permissions:

/etc/apache2$ cd /var/www/vhosts/example1.com/httpdocs
/var/www/vhosts/example1.com/httpdocs$ ls -al
total 188
drwxr-xr-x  5 root     root      4096 Jun 10 22:53 .
drwxr-xr-x  4 root     root      4096 May 29 10:02 ..
-rw-r--r--  1 www-data admin   235 Jun 10 14:22 .htaccess
-rw-r--r--  1 www-data admin   418 Jun  8 20:57 index.php
-rw-r--r--  1 www-data admin 19935 Jun  8 20:57 license.txt
-rw-r--r--  1 www-data admin  7360 Jun  8 20:57 readme.html
-rw-r--r--  1 www-data admin  5032 Jun  8 20:57 wp-activate.php
drwxr-xr-x  9 www-data admin  4096 Jun  8 20:57 wp-admin
-rw-r--r--  1 www-data admin   364 Jun  8 20:57 wp-blog-header.php
-rw-r--r--  1 www-data admin  1476 Jun  8 20:57 wp-comments-post.php
-rw-rw-r--  1 www-data admin  3233 Jun 10 22:59 wp-config.php
-rw-r--r--  1 www-data admin  3136 Jun  8 20:57 wp-config-sample.php
drwxr-xr-x  7 www-data admin  4096 Jun 11 00:37 wp-content
-rw-r--r--  1 www-data admin  3286 Jun  8 20:57 wp-cron.php
drwxr-xr-x 16 www-data admin 12288 Jun  8 20:57 wp-includes
-rw-r--r--  1 www-data admin  2380 Jun  8 20:57 wp-links-opml.php
-rw-r--r--  1 www-data admin  3316 Jun  8 20:57 wp-load.php
-rw-r--r--  1 www-data admin 33837 Jun  8 20:57 wp-login.php
-rw-r--r--  1 www-data admin  7887 Jun  8 20:57 wp-mail.php
-rw-r--r--  1 www-data admin 13106 Jun  8 20:57 wp-settings.php
-rw-r--r--  1 www-data admin 28624 Jun  8 20:57 wp-signup.php
-rw-r--r--  1 www-data admin  4035 Jun  8 20:57 wp-trackback.php
-rw-r--r--  1 www-data admin  3061 Jun  8 20:57 xmlrpc.php
/var/www/vhosts/example1.com/httpdocs$

In this setup all the files are owned by www-data, the user Apache runs as, and they are all writeable by the owner. A good case can be made for removing write access from most of the files (not including .htaccess) as generally only read-access is required. However, this would complicate updates, so it’s simpler to leave things as shown.

If you find you have a .htaccess file, check that it’s owner-writeable. If not,

/var/www/vhosts/example1.com/httpdocs$ sudo chmod o+w .htaccess

should fix it. Also make sure the owner is the same as for the rest of the files (which may not be www-data, depending on the conventions adopted by your configuration).

If on the other hand there is no .htaccess file you could create an empty one rather than leaving the task to WordPress. This way you won’t have to worry about whether WordPress (via the webserver) has permission to create new files in the directory.

/var/www/vhosts/example1.com/httpdocs$ sudo touch .htaccess
/var/www/vhosts/example1.com/httpdocs$ sudo chown www-data:admin .htaccess
/var/www/vhosts/example1.com/httpdocs$ sudo chmod o+w .htaccess

Feel free to vary these commands to suit your circumstances. For example you might want to remove all permission from .htaccess apart from read/write permissions for the owner. Check with a final ls -al to make sure everything’s as expected.

Are we there yet?

With a bit of luck this will have fixed it. Go back to the WordPress dashboard, set permalinks as required, and ‘Visit Site’. If it doesn’t work, try clearing the cache, just in case, and try it once more. If it still doesn’t work we have a couple more things to check.

But first we should make sure WordPress really is updating the .htaccess file.

Enter

/var/www/vhosts/example1.com/httpdocs$ less .htaccess

(or equivalent) and you should see the following

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

If the file is still empty go back over the last section and recheck everything. If the permissions are correct, check that the webserver really is running as www-data (or whatever you have in your setup).  Places to look: /etc/apache2/apache2.conf/etc/apache2/envvars. The relevant line in the latter will be something like

export APACHE_RUN_USER=www-data

IMPORTANT: Make sure .htaccess is being set up properly before continuing!

Apache Modules

So now we know that the correct instructions are being fed to the webserver, Apache, but the webserver is evidently not obeying them. Look at the second line in .htaccess and you’ll be able to hazard a guess as to why this might be. The functionality we require is supplied by an Apache module that may or may not be installed, or even available.

Assuming you have a modern Debian installation (which includes Ubuntu) it’s an easy matter to check. Here’s how:

~$ cd /etc/apache2
/etc/apache2$ ls -1 mods-enabled
access_compat.load
alias.conf
alias.load
auth_basic.load
authn_core.load
authn_file.load
authz_core.load
authz_host.load
authz_user.load
autoindex.conf
autoindex.load
deflate.conf
deflate.load
dir.conf
dir.load
env.load
filter.load
mime.conf
mime.load
mpm_prefork.conf
mpm_prefork.load
negotiation.conf
negotiation.load
php5.conf
php5.load
rewrite.load
setenvif.conf
setenvif.load
status.conf
status.load

In other words, cd to /etc/apache2 and list the enabled modules (I’ve used the -1 argument to list them on separate lines so they will be easier to read in this post). Check down the list (which is in alphabetical order), for rewrite, and see if the module is enabled.

Alternatively, and more concisely

/etc/apache2$ ls mods-enabled | grep rewrite
rewrite.load

If you don’t have a reference to the rewrite module (I didn’t either) you need to fix it. First we see if it’s even available:

/etc/apache2$ ls mods-available | grep rewrite
 rewrite.load

If it is available (and it should be), we can simply enable it with

a2enmod rewrite

and check with ls mods-available.

However, in the highly unlikely event that the rewrite module isn’t even available you’ll have to install it first. At this point I’m afraid I’m going to quote the mantra: ‘Google is your friend’, and leave it up to you to sort it out. Come back here when you’ve done it!

Before continuing you must restart Apache:

sudo service apache2 restart

will normally do it. If that doesn’t work an alternative is:

sudo apachectl restart

Now go back into your WordPress dashboard and try to fix up the permalinks.

Still not working?

To tell the truth, I’m not really surprised. There’s one more setting we may need to apply. The webserver is generally set up to be fairly restricted in what it is allowed to do, as this makes for greater security. In order for the ‘rewrite’ changes to work we may have to loosen some of these restrictions.

The basic webserver configuration is in /etc/apache2.conf, though anything in that file can be countermanded in the relevant file under /etc/apache2/sites-available (accessed via sites-enabled). It is worth bearing this in mind in case the changes I’m about to suggest don’t have the desired effect — what you must do then is search other configuration files for statements that could override them. I don’t think this is very likely, but if it happens you’ll know roughly where to look!

Note on sites-enabled and sites-available This can be quite confusing until you realise what’s going on. All the configuration files for the hosted domains are in sites-available. However, Apache actually looks for them in sites-enabled! This means that a domain can be turned on or off by the simple expedient of adding a ‘symlink’ to sites-enabled, pointing to the corresponding file in sites-available, or removing it. The situation is further complicated by the fact that the default configuration file 000-default.conf (so named to ensure it will be processed first) can contain definitions for any domains, rendering the corresponding example.com.conf file unnecessary.

What you need to look for is a set of instructions like

<Directory /var/www/>
    ...
</Directory>

and you need to modify the statements represented here as ‘…’ so that it ends up as

<Directory /var/www/>
        Options Indexes FollowSymLinks
        AllowOverride All
        Require all granted
</Directory>

Again, there’s a complication. The directory referred to must be an ‘ancestor’ of your WordPress installation or installations. If you can only find instructions for directory /var/www/html, say, and var/www/html isn’t an ancestor of your WordPress directory (document root) you may need to ad a new set of instructions altogether.

Also bear in mind that apache2.conf is probably owned by root (use ls -l to check). If it is and you don’t have permission to change it, simply use sudo to elevate your permissions — for example:

/etc/apache2$ sudo nano apache2.conf

Permalinks should now be working. If not, I’ve given you a few pointers you can follow up. This sort of thing can be very frustrating and sometimes it seems as though it’s never going to work, but in the end it’s just a matter of keeping a clear head, approaching things systematically, and getting all the settings right — eventually!

Sorry it’s been such a long post. I hope it’s helpful to someone — it would certainly have helped me!