The April nycdevops Meetup is Thursday, April 18. Doors open at 6:30pm!

https://www.meetup.com/nycdevops/events/260294692/

NOTE: The meetings are now on THURSDAY.

  • Title: How to build a tamper-evident CI/CD system
  • Speaker: Trishank Karthik Kuppusamy, Datadog, Inc

TALK DESCRIPTION: CI/CD is critical to any DevOps operation today, but when attackers compromise it, they get to distribute malicious software to millions of unsuspecting users. We present how Datadog used TUF and in-toto to develop, to the best of our knowledge, the industry's first end-to-end verified pipeline that automatically builds integrations for the Datadog agent. That is, even if this pipeline is compromised, users should not be able to install malware. We will show a demonstration of our pipeline in production being used to protect users of the Datadog agent, and describe how you can use TUF + in-toto secure your own pipeline.

SPEAKER BIO: Trishank Karthik Kuppusamy is a security engineer at Datadog, Inc. Previously, he led the research and development of The Update Framework (TUF) and Uptane at the NYU Tandon School of Engineering. He is also a member of the IEEE-ISTO Uptane standardization alliance, and an Editor of in-toto Enhancements.

Space is limited. Please RSVP soon!

Posted by Tom Limoncelli in NYCDevOps Meetup

http://dod.nyc for details. DevOpsDays-NYC is Jan 24/25, 2019. Don't miss it!

Posted by Tom Limoncelli in DevOpsDays

Was the root cause of the O2 outage really an expired certificate?

Why wasn't the "root cause" any of these?

  • Certificate expiration not monitored
  • Certificate renewal process complex so that everyone hopes someone else fixes it
  • Certificate renewal is so rare, we aren't good at doing it
  • Deploying new certificates manual and error-prone
  • Vendor did not document all periodic maintenance requirements
  • Soon-to-expire certs not logged
  • Logging for each component an island onto itself

The reason, dear reader, is that there is no such thing as a single "root cause". There are only contributing factors.

When will the industry learn?

Posted by Tom Limoncelli

Disclaimer: I haven't worked at Google for 5+ years so this kind of story is probably outdated. I mean, how could Google not have fixed this problem in the last 10 years?

In 2008 I was on a business trip to Seattle and I had dinner with an old college friend who now worked at Microsoft. I noticed that she had an iPhone. This was when Microsoft was heavily pushing their own phone product, and Android hadn't started shipping.

I thought it was odd that a Microsoftie would be using an iPhone and pointed it out.

"Oh, it's the opposite. We are encouraged to use the competition's products. The better we understand their products, the better we can compete with them."

I thought that was a very sound strategy.

When I got back to the office, I happened to have a meeting with one of the feature designers for Google Docs. I was meeting to suggest some improvements.

The designer was interested in one feature I was suggesting. He asked my opinion of how the UX flow should work. I responded, "Well, have you seen how Microsoft Word does it?"

"Oh no, I try not to look at competing products."

"Why not?", asked.

"Oh, I don't want to be influenced by their design decisions."

Sigh.

Even as an I use a lot of Google products and often I see a feature that has a user experience that can only be described as embarrassingly broken. I use this phrase only when competing products get it right.

I wonder where that feature designer is today.

When was the last time you gave your competitor's product a test run? Used it for a week or two? Does your employer encourage this or discourage this? If you are a manager, do you encourage your employees to do this? Does your corporate culture encourage or discourage this?

Posted by Tom Limoncelli

Cheers to my coworker Taryn for her blog post about how she did an extremely complex series of 30 Microsoft SqlServer upgrades.

If you've seen the film "Apollo 13", there's a scene where they have to get something right in the simulator before they can do it in space. That's basically what she had to do.

Read the post here: How we upgraded Stack Overflow to SQL Server 2017

Here's some takeaways:

  • Set up a lab environment to test complex changes.
  • Communicate with your users.
  • Write a detailed playbook.
  • Don't do it alone.
  • Ask for help from all over.
  • Keep a lab notebook.
  • Record it for posterity!

I'm super proud to have people like Taryn on our SRE team at Stack Overflow!

(Would you like to work with awesome people like Taryn? We're have many of open positions including a west-coast (US/Pacific or compatible) Cloud/Azure SRE, an Internal IT Support Engineer (remote or NYC), and a Junior Technology Concierge Help Desk (London))

Posted by Tom Limoncelli in Stack Exchange, Inc.

Things you might not have known about Google Authenticator:

Copy and paste

If you press and hold the 6-digit number, it puts it in your cut and paste buffer.

Re-order the list

If you click the pencil to go into edit mode, you can change the order of the items.

I find this particularly important because I now have 12 different systems authenticating with this app, and only 4 fit on the screen of my tiny iPhone SE.

I've pushed the ones that I use the most to the top of the list. The Google-related services that generally authenticate via a notification asking "Is this you trying to log in?" are now all shifted to the end of the list, since I rarely need them.

As a result, I am able to authenticate in about half the time.

Posted by Tom Limoncelli in MiscSecurity

My team at Stack Overflow is looking to hire SREs with Windows experience, particularly administration of Microsoft SqlServer.

If you are a system administration looking to move into more of an SRE position, this is an ideal opportunity.

Here's the job listing:

https://stackoverflow.com/jobs/190514/

NOTE: While we are a remote-first team with team members all over the world, this position will have occasional datacenter work requirements, which means 1-hour travel time to our Jersey City, NJ datacenter is a requirement.

Posted by Tom Limoncelli

https://www.alldaydevops.com/

All Day DevOps is a global event held on the internet. 24 hours of talks, over 100 speakers, all streaming over the Internet. 17-Oct-2018

Registration is free!

I will be presenting my talk Stealing The Best Ideas From DevOps: Applying DevOps Outside Of SDLC

More info is at: https://www.alldaydevops.com/

Posted by Tom Limoncelli in Speaking

LISA this year is in Nashville, TN, Oct 29-31, 2018. The full schedule is up! Registration is open!

Three things you should know:

  1. This year Usenix LISA will be 3 days long, instead of the usual 7. This makes it easier to attend, and more focused. I think this is a really good direction for LISA.

  2. The schedule is awesome. I got super excited while reading the schedule. All the talks seemed to be much more focused and a greater emphasis on cutting edge topics and things I want to learn about but haven't had time to study.

  3. I have discount codes. The first five people that email me will get a 5 percent discount code. Send email to tal at whatexit dot org with the subject "DISCOUNT LISA". These are a special thank you to the readers of my blog.

  4. I'm speaking on Tuesday. (Bonus item). I'll be giving a new talk about reforming your operations team on Tuesday. Hope to see you in the audience!

Register soon! https://www.usenix.org/conference/lisa18

Posted by Tom Limoncelli

I'll be the speaker at this month's NYC DevOps meetup. My topic is about reforming the operations side of DevOps in a new talk called "My Operations Reform Checklist".

  • Topic: My Operations Reform Checklist
  • Speaker: Tom Limoncelli, SRE Manager, Stack Overflow, Inc
  • When: Tuesday, September 18, 2018, 6:30 PM
  • Where: Stack Overflow HQ, 110 William St, NYC, NY

For complete details and to RSVP:

 
  • Don't Miss Out - Register Today