Web servers today are exceptionally easy to install and setup. Whether for a web app, blog or any other reasons, installing important services and getting things working nicely can often be completed in a couple of hours even by a novice developer.
But even the most reliable server needs ongoing maintenance to keep the system stable for a long period of long term. Typical servers can often run for months without human interaction. But this fact is tied directly to every bit and piece that needs to be configured beforehand. Of course, you could ignore any of these items for months or even years. But missing them can really haunt you during the worst time, for example when you experience a traffic spike, hacking attempt or hard drive crash.
It may not be too essential on a single server configuration, but it becomes quite critical when you use many servers. Configuration management tools, for example, Puppet or Chef, allow you to create a rule on how servers are put together. These rules are placed on each server to allow an easily reproduced and consistent setup. This offers the ability to quickly boot a new copy on a system and can provide enormous freedom during a set up process.
However, configuration management can add a great deal of initial complexity when setting up a server. But even in a system with two or three servers, the benefits are huge.
This is certainly an obvious step and most administrators do this at least once in a while. But many sysadmins fail to use solid backup strategy and you need to fix it immediately. Waiting even a few days can be disastrous and be sure you do them correctly, because backups are likely to be done incorrectly. Your requirements on backup process may be unique so whatever option you have, be sure to look into any potential shortfalls. A good solution should:
- Do a few rounds of backups
- Run regularly
- Automatically drop obsolete backups
- Keep backups off-site from your physical system
- Secure your data
- Incorporate all essential configuration files, critical files, and recent logs (anything that is needed to get a replacement server running properly immediately).
Immediately after creating a good backup plan, you need to know how to test your backup process. This means frequently checking that the backup mechanism is running properly, that the backup files are not corrupt and valid, and that they have all the data you want and need. For example, if your backup occurs every one month, then you need to re-check them just as often. Any automated tool can help a little here (for example, to check that backup files contain most recent files, valid and of reasonable size). However, nothing can substitute human brain in making sure that everything is fine.
RedHat, Ubuntu, and the other major distros have gotten better recently at getting log-rotate set up and running. So your MySQL and Apache logs can be rotated properly (default settings are fairly sane, but may not be customized enough). However, extra stuffs you add should have integrated log-rotate entries set up. Ignoring this has been the culprit of countless server failures as even hundreds of terabytes of hard drive storage can fill up at the most inconvenient time. Resource monitoring is important in this case.
Tracking CPU utilization, bandwidth, disk space, memory, etc provides very valuable insight into your system. As traffic surges, you can easily compare increased IO or memory usage to allow you scale well ahead of time. RRDTool/Munin, Cloudkick and ServerDensity, are all excellent options for looking at those metrics over time. If your favored tool includes alerting about unexpected changes (full drives, runaway processes, etc) then you will be one step ahead than other admins.
Keeping your MySQL, Apache, and similar processes running is perhaps critical for your system. There are several useful tools, such as God and Monit , that can help ensure smooth working processes. By checking process ids, open ports, or responses, these tools can kill a stalled service and restart it before it brings the whole system down. Configuring rules for specific incidents is notoriously difficult, however when done correctly may allow you to sleep easier at night.
Hardening covers many different actions that must be taken to properly secure any system. Even most simple actions are frequently missed. Can you keep track all running processes? What extra services and ports are open on your current system? Do PAM modules properly loaded for authentication?
Operating System Upgrade
If you’re still using Windows 2000 SP4 as the basis of your web server, it is a good idea to upgrade to higher version, for example Windows Server 2008 R2. It has many similarity with the client-based OS, Windows 7 and offer many new features such as reduced power consumption, new virtualization capabilities, and better management tools. A few sysadmins may even ignorantly still use a discontinued Linux Distro, like Red Hat Linux 9, when Fedora 13 is readily available. Upgrading to a newer OS will not only improve productivity, your web server may also be more protected against malicious attacks.
Web Server Upgrade
Often upgrading to a new OS is still a too drastic, without enough preparation and skill you may face a prolonged downtime, which can be bad financially. In many cases, upgrading your OS, one step above (for example, from Ubuntu 9.04 to 10.04) may not worth the time and effort, due to limited improvements. Instead, upgrading your Apache server from 2.1 to 2.2.6 is faster, easier and gives you more improvements while reducing downtime.
UPSs are designed to run for years without problems. However, nothing is perfect and even the best UPS will fail eventually. For example, quarterly, you may need to check for burned insulations, any signs of wear and loose connections. Semiannually, you need to check for leaks on batteries and capacitors, while annually, you need to clean the enclosures and checking whether the batteries are near their operational life. Some sysadmins may ignore their UPSs for more than eight years and pay the price for their neglect, when the whole system experiences a brownout.
This is very easy to perform on any RPM based system. However, it is difficult to determine if an upgraded security package will cause errors in your stack. Using identically configured staging servers are useful to know whether an update will affect your system. Fortunately, interferences from security updates are extremely rare. The possibility of having a downtime while working on an update’s compatibility problem is far smaller than the possibility of having security holes exploited. So don’t let ignorance stop you from working on the right upgrades. Finally, not all vulnerabilities get a security patch immediately. Monitoring alerts can make you more proactive in keeping your servers secure before a security patch is available. It is an area where there is really no replacement for Mk I eyeballs to keep everything working smoothly and updated.
Log monitoring / Intrusion detection/ Security Scanning
These are perhaps the most ignored things in this list. They’re easily unnoticed and you won’t remember them until your entire system has been compromised. Continuous scanning for hacking attempts, unusual activity, and other foul play can be important to help mitigate attacks and restore damages caused.
This article isn’t an exhaustive list; however it covers things that many sysadmins and developers simply don’t have the knowledge, interest, or time to handle them. Often many development projects are delivered to customers who have no in-house staff capable of dealing with these things once the developer team moves on to other projects.