Awesome Conferences

October 2012 Archives

The photos look like "IBM meets Willy Wonka's Chocolate Factory".

For the first time, the company has invited cameras inside its top secret facility in North Carolina. Our tour guide is Google's senior vice president, Urs Hoelzle, who's in charge or building and running all of Google's data centers. 'Today we have 55,200 servers on this floor. Each of these servers is basically like a PC, except somewhat more powerful.'

The Wired article by Steven Levy:

http://www.wired.com/wiredenterprise/2012/10/ff-inside-google-data-center/

The Google announcement:

http://googleblog.blogspot.nl/2012/10/googles-data-centers-inside-look.html

Walk through it using StreetView:

Video Tour:

Posted by Tom Limoncelli in Google

IPv6 Flashcards

IPv6 is an entirely new protocol. It isn't IPv4 with larger addresses. It is new enough that you'll feel like you are starting over on a new planet; one that invented the internet using protocols that remind you of IPv4 but are.... different.

I find flashcards are a useful way to learn new terminology. I found these online:

Enjoy!

Tom Limoncelli

Posted by Tom Limoncelli in IPv6

US Army 1953 training film on mechanical computers. Gears! Cams! Great animations!

http://www.youtube.com/watch?v=s1i-dnAH9Y4

Posted by Tom Limoncelli

[A Google SRE is a Site Reliability Engineer. That is, the sysadmin that manage the servers that provide Google's internal and external services. i.e. not the ones that do office IT like helpdesk and maintaining the printers. This is now what some people call "devops" but it preceded the term by a few years.]

Someone recently asked me if Google's SRE position has change over the years. The answer is 'yes and no'.

Yes, the job has changed because there is more diversity in the kind of work that SREs do. Google has more products and therefore more SRE teams. Each team is unique but we all work under the same mission, executive management, and best practices. All teams strive to use the same best practices for release engineering, operational improvements, debugging, monitoring, and so on. Yes, since each SRE team is responsible for a different product with different needs, you'll find each one can be unique priorities. I do like the fact that there is so much sharing of tools; something one team invents usually helps all teams. My team might find X is a priority while others don't: we make a tool that makes X better and share it; soon everyone is using it.

On the other hand, no, the job hasn't changed because the skill-set we look for in new recruits hasn't changed: understanding of the internals of networking, unix, storage and security with the ability to automate what you do.

Another thing that hasn't changed is that SREs generally don't work at the physical layer but we must understand the physical layer: The product(s) we manage are run from datacenters around the world and we don't get to visit them personally. You don't spend time cabling up new machines, configuring network ports, or fighting with a vendor over which replacement part needs to be shipped. We have awesome datacenter technicians that take care of all that (note: since we build our own machines even the way we handle repairs is different). The project I'm on has machines that are in countries I've never been to. News reporters tend to not understand this.... I'm in the NYC office and I think it is adorable to read articles written by misguided reporters that assume their Gmail messages are kept at the corner of 14th and 8th Ave..

On the subject of what we look for when recruiting new SREs: we look for experience with scale (number of machines, quantity of disk, number of queries per second). Other companies don't have the scale we have, so we can't expect candidates to have experience with our kind of scale; instead we look for people that have the potential to step up to our scale. We write our own software stack (web server, RPC, etc) so we can't look for people that have experience with those tools; instead we look for people that are technical enough to be able to learn, use, and debug them.

At our scale we can't do very much manually. A "1 in a million" task that would be done manually at most companies has to be automated at Google because it probably happens 100 times a day. Therefore, SREs spend half their time writing code to eliminate what they do the other half of their day. When they "automate themselves out of a job" it is cause for celebration and they get to pick a new project. There are always more projects.

If you are interested in learning about what kind of internal systems Google has, I highly recommend reading some of the "classic" papers that Google has published over the years. The most useful are:

Those last few papers are recent enough that most people aren't aware of them. In fact, I, myself, only recently read the Dremel paper.

While the papers focus on how the systems work, remember that there are SREs behind each one of them keeping them running. To run something well you must understand its internals.

You might also want to read them just because understanding these concepts is a useful education itself.

Posted by Tom Limoncelli in Google

American Scientist has an article that (finally!) explains homomorphic encryption in simple enough terms that even I understand.

Homomorphic encryption permits me to send you encrypted data that you can manipulate but never know the contents. You send it back to me, I decrypt it, and see the result. Imagine if a web-based wordprocessor could store your document, edit your document, but never know what your document says. Yes, it sounds crazy but it is theoretically possible. In the last 4 years that theory has been getting closer and closer to reality.

I think sysadmins should read this article to get an idea of what crypto might be like in the future.

Alice and Bob in Cipherspace: A new form of encryption allows you to compute with data you cannot read

Posted by Tom Limoncelli in Random thoughts or ideas

Flights are filling up. Book soon. And book your hotel too.

One thing I learned from traveling is that it is easier to make a reservation early and cancel/change it than to end up close to the date and find there are no hotel rooms or flights left. This is especially important for hotels.

https://www.usenix.org/lisa

Posted by Tom Limoncelli in Conferences

Posted by Tom Limoncelli

Credits