September 2010 Archives

The October 2010 issue of Usenix's magazine should be arriving in your mailbox this week. I just got mine today. (In a month or so non-members will be able read select articles at this address)

The title is "A system administration parable: The Waitress and the Water Glass"

Take a moment to read it. In a week I'm going to ask people a question but I want people to have read it first.

Update: The URL is now live (and corrected).

Posted by Tom Limoncelli in Community

A coworker debugged a problem last week that inspired me to relay this bit of advice:

Nothing happens at "random times". There's always a reason why it happens.

I once had a ISDN router that got incredibly slow now and then. People on the far side of the router lost service for 10-15 seconds every now and then.

The key to finding the problem was timing how often the problem happened. I used a simple once-a-second "ping" and logged the times that the outages happened.

Visual inspection of the numbers turned up no clues. It looked random.

I graphed how far apart the outages happened. The graph looked pretty random, but there were runs that were always 10 minutes apart.

I graphed the outages on a timeline. That's where I saw something interesting. The outages were exactly 10 minutes apart PLUS at other times. I wouldn't have seen that without a graph.

What happens every 10 minutes and other times too? In this case, the router recalculated its routing table every time it got a route update. The route updates came from its peer router exactly every 10 minutes plus any time an ISDN link went up or down. The times I was seeing a 10-minute gap was when we went an entire 10 minutes with no ISDN links going up or down. With so many links, and the fact that they were home users intermittently using their connections, meant that it was pretty rare to go the full 10 minutes with no updates. However, by graphing it the periodic outages were visible.

I've seen other outages that happened 300 seconds after some other event: a machine connects to the network, etc. A lot of protocols do things in 300 second (5 minute) intervals. The most common is ARP: A router expires ARP entries every 300 seconds. Some vendors extend the time any time they receive a packet from the host, others expire the entry and send another ARP request.

What other timeouts have you found to be clues of particular bugs? Please post in the comments!

Posted by Tom Limoncelli in Technical Tips

CSS Positioning

I admit it. I use tables for positioning in HTML. It is easy and works.

However, I just read this excellent tutorial on CSS positioning and I finally understand what the heck all that positioning stuff means.

http://www.barelyfitz.com/screencast/html-training/css/positioning/

I promise not to use tables any more.

I highly recommend this tutorial.

Posted by Tom Limoncelli in Technical Tips

Date: Thursday October 7th 2010 Time: 7:00 PM - 7:20 PM - Social Time 7:20 PM - 7:30 PM - LOPSA-NJ Business and Announcements 7:30 PM - 9:00 PM - Main Presentation

Lawrence Headquarters Branch of the Mercer County Library, 2751 US Highway 1 Lawrenceville, NJ, 08648-4132

You've probably heard about, or been exposed to, many different compliance regulations these days such as SAS70, PCI, HIPAA, SOX, etc. What do they all mean and why have they been put into place? I will answer those questions and also cover some lessons learned during implementation and on-going support of environments that need to be compliant.

Scott Walters has fiddled with computers since a TI-99/4A. In 1995 he helped found an internet company, then did some consulting for Mack Trucks and Volvo AB, and developed a smidgen of Open Source software, larrd, that was used with Big Brother along the way. He is currently the Director of Client Services at INetU Managed Hosting in Allentown Pennsylvania.

"LOPSA-NJ is an organization for system administrators in New Jersey formed to facilitate information exchange pertaining to the field of system administration. LOPSA-NJ is not affiliated with a particular hardware or software vendor or company. Everyone is invited!"

Posted by Tom Limoncelli

LOPSA has started a mentoring program. Think of it as a "match-maker" service that helps new sysadmins find a mentor.

Most sysadmins do not have a mentor. That's sad. If you recall, the reason Christine and I were inspired to write "The Practice of System and Network Administration" was to share all the knowledge we had been so lucky to gain from our mentors. We wanted to create a "mentor in a book", so to speak, for those who were not as luck as we were. The result is TPOSANA.

However, books only go so far.

I'm excited by the LOPSA Mentor program because it directly help sysadmins, especially the solo sysadmins that are in the most need

Being a mentor is one of the most rewarding experiences you can imagine. From 1992-1996 I mentored someone by email and I have to admit... I learned as much from her as she did from me. We still stay in contact.

The LOPSA program matches a protege with a mentor. They agree on a specific project, for example, setting up a replicated web server, and when they are done write up a little note about their experience. It's that simple. What happens after the project is up to them.

A mentor doesn't give proteges the answers. A mentor helps guide the protege to find the resources they need to work out the answer themselves. It is that kind of knowledge that makes a successful mentor/protege relationship.

Please sign up to be a mentor today!

For more information about the program visit http://lopsa.org/mentor today.

(The full LOPSA press release follows...)

Posted by Tom Limoncelli

How do you keep your network documentation up to date?

(more after the jump)

Being a long-time "vi" user I find that I am constantly surprised by the little (and not-so-little) enhancements vim has added. One of them is the "inner" concept.

Any vi user knows that "c" starts a c change and the next keystroke determines what will be changed. "cw" changes from where the cursor is until the end of the word. For example, "c$" chances from where the cursor is to the end of the line. Think of a cursor movement command, type it after "c" and you are pretty sure that you will change from where the cursor is to.... wherever you've directed.

"d" works the same way. "dw" deletes word. "d$" deletes to the end of the line. "d^" deletes to the beginning of the line ("^"? "$"? gosh, whoever invented this stuff must have known a lot about regular expressions).

VIM adds the concept of "inner text". Text is structured. We put things in quotes, in parenthesis, between mustaches (that's "{" and "}") and so on. The text between those two things are the "inner text".

So suppose we have:

<span style="clean">Interesting quote here.</span>

but we want to change the style from "clean" to "unruly". Move the cursor anywhere between the quotes and type ci then a quote (read that as "change inner quote"). VIM will seek out the opening and closing quotes that surround the cursor and the next stuff you type will replace it.

It works for all three kinds of quotes (single, double, and backtick), it works for all the various braces: ( { and <. You can type the opening or the closing brace, they both do the same thing.

Therefore you can move the cursor to the word "style" in the above example and type "ci<" to change everything within that tag.

I find this particularly useful when editing python code. I'm often using ci' to change a single quoted string.

If there is an "inner", you'd expect there is an "outer" too, right? (How many of you tried typing co" to see if it worked?) Well, there is an there isn't.

In VIM the opposite of "inner" is "block". A block is kind of special. It don't just include the opening and closing elements plus sometimes a the space or two that follow. Given this text:

  • The quick <span class="foo">>brown</span> fox.

If the cursor is in the <span> element, "cb<" will replace the entire element from the < all the way to the >. The whitespace after the element is also replaced for text-related things like change word (caw) and change sentence (cas).

Not having to move the cursor to the beginning of an element to change the entire thing is a great time saver. It is these little enhancements that makes using VIM so much more pleasant that using VI.

Give it a try!

More information about this is in the "Text Objects" section of Michael Jakl's excellent VIM tutorial.

--Tom

P.S. My second favorite thing about VIM? gVIM (The graphical version of VIM) preserves TABs when you use the windowing system to cut and paste.

Posted by Tom Limoncelli in Technical Tips

Since I'm teaching "Time Management: Team Efficiency" at Usenix LISA (San Jose, CA, November 7-12, 2010) I thought it might be a good idea to point out that there is a discounts for organizations sending 5 or more employees. This and other discounts are listed on the conference web site.

People often ask me "but how do I get my co-workers to also do [whatever]" one way is to have them all experience the same training.

(Please remember that my tutorials are on Monday of the conference. This is rather early in the week, so if you normally skip the first few days you'll need to arrive Sunday night if you want to catch all my tutorials.)

Posted by Tom Limoncelli in Conferences

Remember when you were a little kid and had a clubhouse? Did you let someone in only if they knew "the secret knock"? Lately people have talked about various implementations for doing that with ssh. The technique, called "Port Knocking" permits SSH if someone has touched various ports recently. For example, someone has to ping the machine, then telnet to port 1234, then for the next 2 minutes they can ssh in.

This can be difficult to implement securely, as this video demonstrates: http://www.youtube.com/watch?v=9IrCgCKrv8U

IBM's Developerworks recently posted an article about tightening SSH security. The topic also came up on the mailing list for the New Jersey LOPSA chapter.

I had an idea that I haven't seen published before.

I have a Unix (FreeBSD 8.0) system that is live on the open internet and it is so locked down that I don't permit passwords. To SSH to the machine you have to pre-arranged to set up SSH keys for "passwordless" connections. However, it does not run a firewall because it is literally running with no ports open (except ssh). There is nothing to firewall.

Problem: What if I am stuck and need to log in remotely with a password?

Most of the portknocking techniques I've seen leverage the firewall running on the system. I didn't want to enable a firewall, so I came up with this.

Idea #1: A CGI script to grant access.

Connect to a particular URL, it runs SSH on port 9999 with a special configuration file that permits passwords:

/etc/ssh/sshd_config:

PasswordAuthentication no
PermitEmptyPasswords no
PermitRootLogin no
UsePAM no

/etc/ssh/sshd_config-port9999:

Port 9999
AllowAgentForwarding no
AllowTcpForwarding no
GatewayPorts no
LoginGraceTime 30
MaxAuthTries 3
X11Forwarding no
PermitTunnel no
PasswordAuthentication no
PermitEmptyPasswords no
PermitRootLogin yes
UsePAM yes

Translation: If someone is going to get special access on port 9999, they can't use it to set up tunnels or gateways. It is just for either quick access; enough to fix your SSH keys.

The CGI script is essentially runs:

/usr/sbin/sshd -p 9999 -d

Which permits a single login session on port 9999.

Try #2:

FreeBSD defaults to an inetd that uses Tcpwrappers.

So, try #2 was similar to #1 but appends info to /etc/hosts.allow so that the person has to come from the same IP address as the recent web connection. The problem with that is sometimes people connect to the web via proxies, and adding the proxy to the hosts.allow list isn't going to help.

Try #3:

We all know that you can't run two daemons on the same port number, right? Wrong.

You can have multiple daemons listening on the same port number if they are listening on different interfaces. If two conflict, the connection goes to the "most specific" listening daemon.

What does that mean? That means you can have sshd with one configuration listening on port .22 (any interface, port 22) and another listening on 10.10.10.10.22 (port 22 of the interface configured for address 10.10.10.10). But I only have one interface, you say? I disagree. You have 127.0.0.1 plus your primary IP address, plus any IPv6 addresses. Heck, even if you really only had one IP address, "" and a specific address can both be listening to port 22 at the same time.

That's what the "*" on "netstat -l" means. "Any interface."

So, back to our port knocking configuration.

The normal port 22 sshd runs with a configuration that disables all passwords (only permits SSH keys).

/etc/ssh/sshd_config:

Port 22
ListenAddress 0.0.0.0
ListenAddress ::
PAMAuthenticationViaKBDInt no
PasswordAuthentication no
PermitEmptyPasswords no
PermitRootLogin no
UsePAM no

And the CGI script enables a sshd with this configuration:

/etc/ssh/sshd_config-permit-passwords:

Port 22
ListenAddress 64.23.178.12
ListenAddress fe80::5154:ff:fe25:1234
PAMAuthenticationViaKBDInt no
PasswordAuthentication no
PermitEmptyPasswords no
PermitRootLogin no
UsePAM yes

The wrapper simply runs:

/usr/sbin/sshd -d -f /etc/ssh/sshd_config-permit-passwords

That's all there is to it!

Posted by Tom Limoncelli in Technical Tips

System administrators go by many different job titles. Network Admin, Network Engineer, Sysadmin, Computer Technician, etc.

When a campaign requests money from you and asks your title, what do you write?

I propose we all write the same exact thing: "Computer System Administrator"

It matters. It really really matters. Campaigns do data-mining on the various job titles people put. A 10% increase in donations from "doctors" is meaningful. Donations from twice as many [insert job title] than [insert other job title] is meaningful. When we diffuse our political power by dividing our donations between different job titles it hurts our political influence.

I know that in the past I've written everything from 'System Administrator" to "author" to "Software Engineer".

This political season, and from now on, I will always record my title as "Computer System Administrator". It isn't a perfect explanation of what I do, some would say I spend more time writing code that doing real system administration. But I feel that to the people working on political campaigns, who aren't the most technical in the world, the phrase "Computer System Administrator" is clear and concise.

If we all consistently use the same phrase it will have an impact.

In the US the election season is starting to heat up. Yesterday was the last primary. Between now and election day you will undoubtedly receive many MANY many emails asking for money. Whether you agree with my politically or are one of the bad, bad people that disagree (I'm kidding! Really!) let's all agree to do this.

I'm Tom Limoncelli, a computer system administrator, and I vote!

Posted by Tom Limoncelli in Professionalism

The tutorial I'm teaching at Usenix LISA (San Jose, CA, November 7-12, 2010) Monday afternoon is totally new material: "Advanced Time Management: Team Efficiency"

When I say "totally new", I mean it.

  1. New material. This is material I've never put in books, articles, blog posts or given talks about at other conferences.
  2. New topic. The last 3-4 years I've been repeating tutorials that I've taught before with one exception that was 40% a rehash of my older stuff.
  3. New format. Thanks to the pervasiveness of wifi and laptops at LISA, this tutorial will have in-class exercises. It will be a mix of lectures, demos, and in-class exercises.

This is a LISA-2010 exclusive. I'm reducing my conference attendance for a few year so this may be the only chance you have to attend.

The class is on Monday afternoon. Monday morning I'm teaching an upgraded version of "Time Management for System Administrators" class which is a good precursor to the afternoon class.

LISA runs from Sunday to Friday but a lot of people skip the first day or two. If you are making plans to come to LISA (and September is a good month to start talking it up with your manager), make sure your travel arrangements include being at the conference Monday.

Read the complete tutorial description.

The sooner you register, the more discounts are available!

Posted by Tom Limoncelli in Conferences

Thomas Marino GOP congressional candidate Thomas Marino misreported income.

When not doing that, he GOP Candidate Thomas Marino defends swindlers.

Posted by Tom Limoncelli in Politics

Google Chrome supports multiple profiles. The feature is just hidden until it is ready for prime-time. It is really easy to set up on the Linux and Windows version of Chrome. On the Mac it takes some manual work.

I'm sure eventually the Mac version will have a nice GUI to set this up for you. In the meanwhile, I've written a script that does it:

chrome-with-profile-1.0.tar.gz

Tom

Jon Runyan

Jon Runyan

Jon Runyan is the Republican candiate for NJ-3

Jon Runyan is medocre, a bully, and one of the dirtiest players. Every year NFL players vote on the meanest, nastiest and dirtiest players in the league, and Runyan ranked near the top of that list in every single season that he played.

Posted by Tom Limoncelli in Politics

A bunch of people noticed that my article about what system administration is like at Google appeared briefly earlier this week and then the URL went dead soon after.

Before any crazy rumors spring up...

I clicked "save" when I meant to click "schedule for a future date". Sadly it got into the RSS feed before I could do anything about it. Web apps have no CTRL-C!

I tend to schedule my major posts for 10am US/Eastern on Monday and Wednesday and sometimes Friday. Tiny snippets and comments get fit in between.

For example... this article is timed to appear today about 15 minutes after 10am.

(Note: I had to change the URL to force it to enter the RSS feed. Sorry about the broken links!)

P.S. I just found a setting so my blog software will default my posts to "Scheduled" instead of "Published". That should prevent future problems.

Posted by Tom Limoncelli in Site Announcements

People often write to me for advice about submitting a resume to Google. They always want to know what system administration is like here. I tend to write something original each time, but I always say the same thing. Therefore, I'm going to post it here and refer people to this post in the future.


What is system administration like at Google?

System administration at Google is quite different than at most companies. Since we do all our computing on massive clusters we don't touch much hardware. It is set up a rack at a time for us in data centers around the world. We don't spend a lot of time setting up typical services like LDAP, Kerberos, and load balancers because we have those things set up already, automated, and scaled.

So what /do/ we do? We keep the various web sites of google running, or keep the systems that they rely on running. We have small teams for each product or chunk of infrastructure (DNS, storage, etc). Members usually have 1 week of "on call" (pager) duty out of 6, and the other 5 weeks are spent making improvements to make the on call portion more optimized, automated, and trouble-free. (Some groups are 1 out of 8, or 3.5 days out of every 12, or whatever... some have zillions of pages per day, others 1 or two per week).

When you aren't on duty, you are optimizing the system for better operational efficiency. Suppose the last time you were oncall you noticed a certain problem was happening a lot. You might take on a project to monitor all aspects of it, find the problem, and either fix it or work with the developers to fix it. We have large, complicated systems that requires deep Unix knowledge to maintain properly.

A senior person takes on projects that have even larger impact. So, maybe you notice that not only does your product have that particular problem, but other projects do too. So you invent a new system that fixes the problem globally. For example, rolling out software to thousands of machines is difficult. After each upgrade the machine must be tested to make sure the software is working, and begin rolling back the change if there are failures. We haven't done that kind of thing manually in years, but maybe the current system can't tell the difference between a bug that means you should roll-back all systems, or a hardware failure that means one particular machine is down. You might refactor the system so that the failure test is a plug-in module and now everyone can write their own plug-ins.

This is why we call ourselves "Site Reliability Engineers" instead of of Sysadmins. As a SRE, we are constantly optimizing the system. Some of us are more focused on operational aspects (sysadmin skills) and others are focused on writing tools (software engineering focused). However, we all do some kind of software development, scripting, and so on. At our scale it isn't a question of whether or not to automate, but how to automate.

We also have more traditional system administrators in our CorpEng group. They are focused on internal systems (the printing system, deploying the newest Mac OS release, etc.). This is also the group that employs our internal helpdesk. Supporting 20k employees is a big job. Most companies don't have 10,000+ linux desktops and 15,000+ Mac laptops, plus countless other systems.

A few more great things about working here:

  • The relationship with the developers is collaborative.
  • You can stay technical while going up the career ladder without having to move into management.
  • My co-workers are so smart, I usually feel pretty stupid.
  • The challenges are so huge, I think we're 2-10 years ahead of the industry. Working here is like having a crystal ball that lets you see the future of our industry.

Posted by Tom Limoncelli

LOPSA members save $45 when they register at to attend Usenix LISA, in San Jose, CA, November 7-12, 2010. I'm a member of LOPSA and I hope you join too. To find out about this discount you have to have to be a LOPSA member, and to become a LOPSA member you need to set up a free account first.

Step 1: Register for a free account on the LOPSA website: http://lopsa.org/user/register

Step 2: Become a LOPSA member: http://lopsa.org/joinup

Step 3: Paid members will see the discount code on http://lopsa.org/MemberDiscounts (Non-members see different text. Sneaky, eh?)

Posted by Tom Limoncelli in CommunityConferences

FIRST I book my hotel room, THEN I forward the announcement to all my friends and encourage them to register.

Yeah, I really like staying at the primary hotel.

Posted by Tom Limoncelli

On behalf of the LISA 2010 organizing committee, I'd like to invite you to join us for a career-building adventure in San Jose, CA, at the 24th Large Installation System Administration Conference, November 7-12, 2010:

http://www.usenix.org/lisa10/proga

For over 20 years, the Large Installation System Administration Conference (LISA) has been the must-attend system administration conference. It offers an unparalleled opportunity to meet and mingle with the leaders of the system administration industry.

Take advantage of 6 days of training that offer face-to-face time with expert instructors, including:

  • David N. Blank-Edelman on Over the Edge System Administration
  • Patrick Ben Koetter and Ralf Hildebrandt on Dovecot and Postfix Administration
  • Tom Limoncelli on Time Management for Sysadmins and for Teams

We're again offering series of classes focusing on some of the most important topics you'll encounter including:

  • The Virtualization Series, offering both new and repeat classesthat provide the latest virtualization information by instructors such as John Arrasjid and Richard McDougall
  • New! The Linux Security and Administration Series, featuring in-depth Linux training by experts including Ted Ts'o and Rik Farrow
  • New! The Super Sysadmin Series, showcasing techniques for time and project management, raising your visibility, and other key skills to take your career to the next level by expert instructors including Tom Limoncelli and Strata Rose Chalup.

The full training program can be found at http://www.usenix.org/events/lisa10/training/

In addition to the training, 3 days of technical sessions include top-notch refereed papers, Practice and Experience Reports, informative invited talks, expert Guru Is In sessions, and a poster session.

http://www.usenix.org/events/lisa10/tech/

Over a dozen invited talks feature our most impressive slate of speakers to date. They include:

  • Keynote Address: "The LHC Computing Challenge: Preparation, Reality and Future Outlook," by Tony Cass, CERN
  • Closing Session: "Look! Up in the Sky! It's a Bird! It's a Plane! It's a Sysadmin!," by David N. Blank-Edelman, Northeastern University CCIS
  • "10,000,000,000 Files Available Anywhere: NFS at Dreamworks," by Sean Kamath and Mike Cutler, PDI/Dreamworks
  • "Operations at Twitter: Scaling Beyond 100 Million Users," by John Adams, Twitter

LISA is the leading forum for presenting new research in system administration. This year's top-tier research showcases work covering key topics such as configuration tools and firewall analysis.

The Practice and Experience Reports provide real-world examples from a variety of topics, including implementing IPv6, configuration management for Mac OS X, and more.

Bring your questions to the experts in the Guru Is In sessions to unravel your greatest technical mysteries.

Explore the latest commercial innovations at the Vendor Exhibition.

Benefit from opportunities for peer interaction (a.k.a. the "Hallway Track").

Plus: workshops, posters, BoFs, and more

Discounts are available!

Help us promote!

For complete program information and to register, see http://www.usenix.org/lisa10/proga

Early registration discounts are now available. Register by Monday, October 18, and save up to $300!

We're pleased to bring LISA to San Jose, CA, and we look forward to seeing you there.


LISA '10: 24th Large Installation System Administration Conference http://www.usenix.org/lisa10/proga November 7-12, 2010, San Jose, CA Early Bird Registration Deadline: October 18, 2010

Sponsored by USENIX in cooperation with LOPSA and SNIA

Posted by Tom Limoncelli

Registration for the 2010 Ohio LinuxFest has been extended through September 8th, and the registration contest has also been extended until the 1,000th registration has been reached.

One lucky registrant will win an upgrade to the Supporter Pass, or a Professional Pass registration for Ohio LinuxFest 2011 worth $350, at the choice of the winner. Full details are available at http://ohiolinux.org/who-will-be-number-1000.html

Sign up today and have a chance to win!

Posted by Tom Limoncelli

xed 2.0.2 released!

xed is a perl script that locks a file, runs $EDITOR on the file, then unlocks it.

It also checks to see if the file is kept under RCS control. If not, it offers to make it so. RCS is a system that retains a history of a file. It is the predecessor to GIT, SubVersion, CVS and such. It doesn't store the changes in a central repository; it comes from a long-gone era before servers and networks. It simply stores the changes in a subdirectory called "RCS" in the same directory as the file. (and if it can't find that directory, it puts the information in the same directory as the file: named the same as the file with ",v" at the end.)

[More about this little-known tool after the jump.]

Posted by Tom Limoncelli in Technical Tips

Well it is the first of the month and it seems like I have internet access still. That's good news.

Lets see what happens my DHCP lease expires. That's the real test.

I don't want to push my luck, but it looks like good news so far!

Posted by Tom Limoncelli

 
LISA14 I'm Teaching button