Databases: What you need to know

These days, there are either fewer DBAs or more, depending on how you look at it. According to John Sellens, as the number of applications that are powered by databases increases, there's more and more of a requirement for systems administrators to have some database experience. The goal of the databases training is to arm sysadmins with just enough knowledge to get the job done.

Having somehow managed to spend five years without touching a database, I was particularly interested in the session. John opened with a discussion of what a database is. As it turns out, there's some disagreement over the exact definition. “I asked a bunch of places on the Internet ‘what’s a database?'" he said. "I don’t think my browser is as smart as it thinks it is.” Once we came up with a working definition for a database, it was time to talk about running a database instance.

Starting the database management system is generally a painless process. Shutting down can be more interesting. A busy database may be in the middle of a transaction (a single- or multi-step update or query) when you want it to shut down (or worse, when the host machine goes away suddenly). "Real" databases have recovery mechanisms built in that will perform sanity checks on the database at the next start to ensure that integrity is maintained.

This talk of failure strikes fear into the heart of a sysadmin, so knowing how to perform database backups is important as well. Traditional file-based backups may not correctly capture the state of the database and will leave you with totally useless 1s and 0s. Three kinds of database-appropriate backup strategies exist: cold backups stop the database long enough for the backing files to be copied; hot backups connect to the database as a client and export the tables to a backup file; snapshot backups pause and sync the database, trigger a filesystem snapshot (think LVM) and restart. Which strategy is best depends on the use case. Cold backups are the simplest, but may require downtimes of several hours, an unacceptable condition in many environments.

The training also covered basic SQL commands and syntax. SQL is a declarative language, not procedural, so it lacks the control statements that most admins are used to. It also lacks regular expressions, so it is a big mindset shift for me. Nonetheless, it is possible to write some really complex and foot-shooting commands.

At the conclusion, a discussion of several different database programs was presented. The clock forced compression of this section, but Berkeley DB, MySQL, and PostgreSQL were all touched upon. All three projects are being actively developed to add new features and make it easier for systems admins to pretend to be DBAs.

So which DBMS is right? It depends on the circumstances. The first advice is to go with what the application best supports or what is best supported internally. And if that doesn't give you an answer?

Berkeley DB is suitable for:

  • Single table?
  • Shared database not required
  • Speed and management ease important
  • Data changes less frequently

Otherwise, choose between MySQL and PostgreSQL:

  • Are there any features you particularly need?
  • Are MySQL licensing terms an issue?
  • Is MySQL ownership, thus project forks, or potential forks an issue?
  • What are other applications using?
  • Does your language of choice have bindings to the database?

In the absence of other compelling arguments, John leans toward PostgreSQL based on its maturity, features, and licensing. Regardless of the choice, it's important for sysadmins to have an understanding of the tool in order to properly support it. There's far more to know about databases, but this training provided a great introduction.