USENIX Update

December 7, 2011

Recovering From Linux Hard Drive Disasters

Filed under: LISA Conference — Tags: , , — Marius Ducea @ 3:57 pm

Ever had a hard drive failure? Ever kicked yourself because you didn’t keep backups of critical files, or you discovered that your regular nightly backup didn’t succeed? If this sounds familiar then Theodore Ts’o training “Recovering From Linux Hard Drive Disasters” should be on your LISA schedule because this tutorial covers in depth details on how to recover from disasters caused by software or hardware failures.

After covering the basic types of failures (user goofs, software bugs or hardware failures), Theodore explained the first and most important step you should do when data is lost: DON’T PANIC! Most of the time, the first reaction after a failure causes more damage than the initial failure itself. You should remain calm and try to determine what happened, create a backup of the failed disk image if necessary, (even if that’s as simple as using dd:
dd if=/dev/hda1 of=/dev/hdb1 bs=1k conv=sync,noerror) and after that try to recover the data.

In order to understand the different failures Theodore explained how data is stored on disks and also the different hardware components that can fail. Next he moved on to the partitions types and major filesystems, each with their own special characteristics and features:

  • FAT
  • ext2/3/4
  • reiserfs
  • JFS
  • XFS
  • ZFS
  • BTRFS

Here are some of my takeaways from this awesome session:

- you should monitor your logs for errors that are sent to the console or system/kernel log that might indicate hardware or filesystem failures. If you are alerted promptly for such events this will help you identify and react faster to failures. For example, a hardware failure logged by the kernel:
– hda: dma_intr: status=0x51 { DriveReady SeekCompleteError }
– hda: dma_intr: error=0x40 { UncorrectableError } LBAsect = 408672, sector 1204 end_request

or an ext3 filesystem error:
EXT3-fs error (device md(9,2)): ext3_readdir: bad entry in directory #2670595: rec_len %% 4 != 0 -offset=0, inode=
- there are different tools that can be used while doing backups or recoveries; it is important that you are familiar with them and know their output and tested them out before. If you are using them for the first time during a crisis odds are you will not do very well if you don’t have any experience with the tools. Some of the interesting tools recommended are e2image, dd_rescue, gpart, cfv.
- save the output of fsck commands as they might be valuable later during troubleshooting.
- save your partition table. Even if you just take the output of fdisk -l, save it (even in a simple file) as you might need it in case of a partition table corruption.
- LVM is not a substitute for RAID and RAID is not a substitute for backups
- and finally about backups: just do them. Use any tool you like, but just do them. Even a simple tar script will do its job.


In this class, Theodore Ts’o covers all the possible (and impossible) hard drive and filesystem failures and how to deal with them, and if you care about your data you should definitely attend it as it will help you prepare for the time when failure will happen. Finally, don’t forget the most important step when data is lost: “DON’T PANIC”; with enough care, you can usually get your data back.

Using and Migrating to IPv6

Filed under: LISA,LISA Conference — Tags: , , , — Ben Cotton @ 10:00 am

The Internet is facing a slowly-unfolding crisis. The Internet Assigned Numbers Authority (IANA) ran out of assignable IP address blocks in April of this year. APNIC ran out of its allocation in April as well. The other regional registries have only a few years’ worth of addresses to issue. There is an obvious need for the larger address space that IPv6 provides, yet adoption remains low. Shumon Huque’s training session on Tuesday afternoon aimed to fix that.

Many IPv4 concepts have IPv6 analogues, although the two protocols are not compatible. There are some differences, however. For one, there is a greater emphasis on the client self-configuring. Machines that do not have static addresses set can obtain configuration via Stateless Address Autoconfiguration (SLAAC) or with DHCPv6. This does make it more difficult to control what devices can connect to the network. Some organizations prefer to use DHCPv6 because it allows pre-creating DNS records for the address pool. Automated configuration makes heavy use of ICMP, so it’s important to be judicious when crafting firewall rules.

Most operating systems have out-of-the box support for IPv6, and many have it turned on by default. Major applications have IPv6 support as well, including web browsers, IMAP servers, and instant messaging applications. (A large, but incomplete list of applications with IPv6 support can be found at http://www.ipv6-to-standard.org/) So where’s the holdup? There seems to be a lack of support for IPv6 in consumer-grade modems and routers, and lSPs and CDNs are slow to roll IPv6 to customers.

Comcast and Time Warner have begun limited IPv6 rollouts to friendly customers, and Akami claims to be the first CDN with IPv6 support. In the meantime, people interested in using IPv6 can set up tunneling through 6to4 or Taredo, or with a managed tunnel like the ones provided by Hurricane Electric, Freenet6, and Sixxs.

IPv6 adoption will be forced eventually, as the remaining IPv4 addresses are taken. But IPv4 will continue to coexist with IPv6 for many years to come. There are still issues to work out with IPv6, including user and admin training. Additionally, DHCP failover will need to be added to DCHPv6 before some sites will be willing to make the transition. In addition, network appliances like intrusion detection and intrusion prevention services will need to mature their IPv6 support.

Until then, expect to see IPv6 training remain a LISA staple.

 

December 6, 2011

Perl 6 for Users and Sysadmins

Filed under: LISA Conference — Tags: , , , , , — Ben Cotton @ 12:46 pm

When he’s not busy demystifying RRDtool, Tobi Oetiker is a language evangelist. On Monday afternoon, he brought Perl 6 to the masses. Brian Sebby joked on Twitter: “Perl 6 has a lot of cool features that I’d really like to use. I also felt this way when it was just around the corner at LISA ’02.” Indeed, Perl 6 has been 11 years in the making, and still hasn’t gained much of a foothold in the sysadmin community. Perhaps that’s not too surprising, considering how useful Perl 5 remains.

Perl 6 is not just an incremental upgrade, but a full re-imagining of what the language should be. Tobi describes it as “Perl for the people, not Perl for Larry [Wall]“. While much remains the same from Perl 5, especially the “there’s more than one way to do it” philosophy, much has changed. Huffmanization is a major design consideration, leading to new functions like say, which is similar to print, except shorter and it automatically appends a newline to the end. This effectively saves four keystrokes per line.

Variables get some updated treatment, too. +$variable treats the value as a number, while ~$variable is treated like a string. To use $variable as a boolean, prepend a ‘?’. Much like Unix’s “everything is a file” philosophy, Perl 6 has embraced an “everything is an object” philosophy. So variables now have their own methods. Want to capitalize a string? Try $variable.captialize Variables can even print themselves, like so: $variable.perl.say

Strings are more powerful than in Perl 5. Code can be executed inside a string by using {} or by calling a function with &. This prompted one attendee to ask “when did Perl become Lisp?” Tobi replied with “whatever your favorite language is, you should be able to program it in Perl 6.” In fact, much of Perl 6 is written in Perl 6.

Many new (and old) quoting styles are available for use, with more configuration options to control behavior. Regular expression syntax has changed as well. $/ now contains regexp matches. Match options are moved before the pattern, and capitalization always reverses the metasyntactic meaning of an escaped alphabetic character.

With the everything-is-an-object focus, you’d naturally expect Perl 6 to have features for object-oriented programmers. Classes are easily defined with the ‘class’ directive, and [single] class inheritance is supported. Classes can pull in functionality from ‘roles’ by using the ‘does’ statement.

New data types and even operators can be written in Perl 6 programs. switch-like functionality is made easier with given…when. The syntax of control structures has changed, and if now supports multi-comparison statements (e.g. 3 < $pi < 4 ). subroutines can now take named arguments instead of positional arguments, making functions which have many arguments easier to manage.

Frankly, it was hard to keep up with all of the newness that Perl 6 offers. It’s a very interesting language, but will require a lot of work for Perl-5-savvy sysadmins to migrate to. The implementation is still rapidly evolving, which also causes some shyness for operational use. Several of the examples in the session work on an April build of the raduko Perl 6 interpreter, but not on the latest. It’s easy to understand why adoption has been slow, but perhaps 2012 will be the year of Perl 6?

The Limoncelli Test

Filed under: LISA Conference — Tags: , , , — Marius Ducea @ 9:06 am

Earlier this year, Tom Limoncelli wrote a blog post about how to rank and improve your sysadmin team. He was inspired by Joel Spolsky‘s post entitled “The Joel Test: 12 Steps to Better Code”, a 12-question “highly irresponsible, sloppy test to rate the quality of a software team”.

If you haven’t read it before you should definitely do it now. Tom liked Joel’s post very much and immediately wanted to write a similar one for system administrators, summarizing his own experiences and answers he was giving to people asking for his advice on how to improve their sysadmin teams. He could not fit that in only 12 questions and he ended up with 32 (although he considers 12 to be core questions) that he published in his blog post. Based on that blog post, this was the first time that he taught this course titled “The Limoncelli Test” and basically it was intended as a way to describe the best practices needed to build a strong sysadmin team.

The course started by having everyone taking the test: http://goto.tomontime.com/test. It is available online and you can send your answers and this will help Tom gather some anonymous statistics, so be sure to check it out. After everyone finished the test, Tom went over the results and asked where everyone was standing. Most of the people fit in the 16-25 yeses, meaning most people had solid foundations. There were no people with more than 26 points though. The questions are organized in 7 categories:

  • Public facing practices
  • Modern team practices
  • Operational practices
  • Automation practices
  • Fleet management practices
  • “We acknowledge that hardware breaks” practices
  • Security practices

Next, we went over each question and selected the ones people were interested to talk about as there was not enough time to go over all of them in detail. Tom discussed the challenges to implementing these best practices and had also some fun and very useful stories to share from his vast experience. We also had some very useful feedback from attendees with their own experiences in implementing a ticketing system, defining policies, recording (and using) monthly sysadmin metrics, writing “design docs” or having the appropriate level of monitoring, etc.

In the last part of the training Tom discussed change and why people don’t like to change and had specific suggestions to how to overcome some of these challenges:

  • the 5 Why‘s rule: ask “why” 5 times to help understand the root of the problem.
  • reveal only one step at a time: to help maintain focus and reduce resistance.
  • data-driven: let data tell the story

He ended this part by explaining what matters to CEOs and in what order (first place is revenue, next increasing scarce productivity, cutting costs, competitive advantage, and on the last place technology for the sake of technology) and how to make your case understood correctly by understanding their priorities.

At the end of the session Tom asked everyone to write the top 3 things they would like to implement and circle the one they think is the most important. We then went to every attendee and read our top take-away. Based on people’s feedback most people were interested in implementing “the 3 empowering policies” and the “opsdocs

If you haven’t done it already I would strongly recommend to take the test and see how does your sysadmin team rank on “The Limoncelli Test“.

Older Posts »