One turkey every year, typically named Tom, get’s a pardon from the President of the United States every Thanksgiving. That’s a pretty stupid tradition, but politics has a litany of action for the sake of public approval over actual substance scenarios that play out even dumber than this one. The thing that strikes me about this though is that the turkey is being forgiven of something that he never did, essentially a pardon is a waiving of the burden of guilt and the subsequent sentence associated with it.
Think about every post mortem or crisis room you have been in, how often is the fault of the issue acknowledged and forgiven so completely? Or even better how often is the problem not a direct result of anyone in the crisis room? It doesn’t happen is most likely the answer unless you are running blameless post mortems.
Now here is the thing, blameless post mortems are AWESOME in theory. No finger pointing, no
screaming, just a calm discussion of what happened, how it was diagnosed and how it was solved who wouldn’t want to be a part of that after a crisis? In practice though, this requires a level of maturity with
everyone in the room, a room full of mature IT people is like a porcupine with a good hair day, not nonexistent just a rarity.
What if though we all take a deep breath in the room with all the stress around us and realize that we are all passionate professionals? What if we are able to separate our individualism for the sake of the team?
Bare with me as I have a back in my day moment ….
Back when I was in the operations business sitting in the data center every silo’d team had friends on other teams. We laughed a luncheons together, we had inside jokes about late night patching or outages or pranks we pulled. When it came to troubleshooting a serious issue we banded together and fixed it as as a team no matter what the root cause ended up being. But at the after action we were all so proud we would try to deflect root cause away from our team. It was not the right way to deal with it.
I have been fortunate enough to have seen blameless post mortem meetings since leaving full time operations. Wow what a difference. The meetings are swift and on target with determining root cause and sharing information and results. The efficiency is striking because no one feels like they have to explain their actions away, instead they are just explaining how things occurred and were fixed. Crisis meetings in these groups are equally impressive, with teams coming together and everyone helping with the troubleshooting process.
If you are interested, I would highly recommend checking out the folks over at Etsy who seem to do this better than anyone and are open enough to write about it. Here are some links:
https://codeascraft.com/2012/05/22/blameless-postmortems/
https://www.pagerduty.com/blog/blameless-post-mortems-strategies-for-success/
https://www.etsy.com/teams/7716/announcements/discuss/10641726/
What are your experiences with this?
Oh and Happy Thanksgiving everyone!!

Just like video killed the radio star I wonder if google hasn’t killed the IT troubleshooting skills. Once upon a time we followed a common set of principals to determine root cause of a problem and resolve it. Today though so many admins jump straight to google with a list of symptoms rather than establishing root cause. Not that google is bad, in fact it’s a great tool to help but I am not sure it’s the first place to look.
have all seen enough medical dramas to get where I am going. But it’s important to keep track of issues because when you hit the next step it’s easy to go down a rabbit hole fast. So jot down the issues you are seeing and when it seems to have started happening. Web service unreachable? Ok is it an individual site, or page, is it localized or wide-spread, affecting multiple browsers? All good things to take in.
Now that we know what the issue we are looking for is, and have some breadcrumbs to guide us to a few possible causes we need to go all detective on it. Based off of logs and symptoms we can develop a list of hardware, software, network components, firewall rules, patches, or services, that are causing our issue. If logs aren’t giving you anything, diagnostic tools can be used like sysmon, filemon, netmon, grep, or even baseline analytic tools like tripwire, vRealize Operations, or SolarWinds can bare the answers.
cycle of convincing everyone that VDI is a thing. But even with that said wide scale VDI isn’t the norm, organizations still struggle to get projects moving forward because of the cost of running separate architectures for VDI, that sits apart from primary datacenter workloads. This is part of the traditional design requirements that were driven by IOP limitations of most storage solutions. All flash arrays (AFA) have helped to solve the IOP issues, and are great for running super fast virtual desktops. Despite solving one problem AFAs haven’t changed the separation of architecture discussion.
Pimp’n may not be easy, but making a decision on building out EUC on SolidFire sure seems to be a no brainer.