Setting Up to Fail by Vaidehi Joshi
This talk was about the ways distributed systems can fail of which there are many. She loves learning and teaching others as a way to better understand concepts so she created BaseDS to teach distributed systems. Previously she created BaseCS to help others understand the basics of computer science.
Even basic web development with a single server and single client is a distributed system which means the vast majority of developers are working on distributed systems these days.
Failure modes = types of failures a system can experience
Lots of types but talk focused on two: timing and value response.
The expected time interval, e.g. 5-5000ms - means the amount of time it should take for a node in the system to respond.
Nodes being too slow or too fast can be a problem
Ommision failure - node never response
Subtypes are “send omission” and “receive omission”
#2 Value Response Failure
Response from node is incorrect.
State transition failure
Take away: Think about the types of failure modes your system will have and how the system should respond.
The Tortoise and the Hare Write Software by Erica Gomez
The software is obsessed with moving fast. Too fast in many cases.
Erica has worked on airplane systems where there is intentional friction and development moves slow out of necessity. But in many other industries, software engineers focus too much at moving fast at the expense of bugs and major problems that our users have to endure.
Random fact: airplanes have ~100,000 sensors on a place. Can predict future failures so parts can be replaced before there is a problem.
The software industry has seen many incredible advancements over the decades and improved tools. E.g. AWS lets teams more super fast.
Decades ago software engineers used punch cards which they’d have to schedule to be run or sometimes mail off to be processed and sent back with the result. The process could take weeks or months. This forced software engineers to slow down and be careful.
Easy rollback means we can be reckless.
This impacts engineers, not just end users. E.g. getting paged at night, dealing with broken systems.
Very little incentive for us to slow down.
Look for ways to add intentional friction and strike a balance between moving too slow and too fast. For example, introducing automated deployment tests slows down deploys but help us catch bugs.
Crunch time for engineering teams (moving too fast) has been shown to increase defect rate. Resting, getting away from the computer, going on vacation and help lower the defect rate.
Calculate the risk of moving too fast. What is the impact of defects?
Take away: I have had similar thoughts that engineering teams often move too fast which results in lots of problems. It’d be interesting to see some stats around this talk. But ultimately the decision to move fast or slow is not up to engineering teams but dependent on goals set by the business. They can sometimes push back but often that isn’t feasible.
Files by Dan Luu
This talk was all about how broken file APIs, file systems, and hard drives are. There are tons of ways for them to fail.
People think reading/writing files is very simple but it is way more complex than they realize.
Reading/writing files is a very brittle process. Even the lowest level API like pwrite have their problems. Engineers treat these APIs as atomic when they aren’t and can fail in unexpected ways. To write a file takes multiple steps, any of which can fail.
These problems are most obvious to big companies like Facebook, Google, and Amazon since they have millions servers and hard drives. But remember there are billions of consumer machines that are experiencing the same issues.
Disks and SSDs have much higher error rates than is advertised.
Modularity by Kamal Marhubi
The usual advice is often misapplied:
- Inject dependencies
- Reduce coupling
Much of this advice has a long history that has been forgotten and programmers don’t have the context. Also, things like “coupling” aren’t even well defined or understood.
For example, in the paper “Go To Statement Considered Harmful” by Edgar Dijkstra, the context and motivation are often not understood by engineers today. Primary motivation was to keep the call stack easy to follow. Go To would disrupt the call stack and make it difficult to debug problems.
Rather than just regurgitate “best practices” like “keeping code DRY”, make the goal to make code that is easy to understand and modify.
- Reduce connections
- Use things like interfaces to have multiple interfaces
- Use language features like data hiding
Note when a change is difficult and make TODO to improve. Consider creating categories of TODOs to make it easy to know.
Go beyond style in code reviews.
Ask for clarifications in PRs, get those turned into comments.
Look for hidden assumptions like getThings(). What if there are multiple items? Is there a special order? What if null is returned instead of array?
What We Can Learn From Software History by Hillel Wayne
Engineers don’t know the history or have the context from engineers from decades ago for the decisions they made. Over time, new reasons are created and don’t align with the historical reasons.
For example, take the common interview question, implementing a linked list. How many engineers have to implement a linked list these days? A quick poll of the audience showed it isn’t common. Interviewers will give reasons like seeing how the person thinks on their feet but that doesn’t match up to the history of the question.
The speaker did some research into interviewing questions from decades ago. That question grew in popularity when C programming was extremely common. If you programmed C you’d have to build linked lists all the time. It’d be an easy task. So back then if the question was asked, the interviewer simply wanted the interviewee to demonstrate a little knowledge of C.
But demonstrating a basic knowledge of C isn’t applicable to most programming jobs so that reason isn’t usually given for asking the linked list question.
Deconstruct was one of the best conferences I’ve attended. The talk quality was high, there were no sponsored talks (and no sponsors at all) and they were all a good length and had a good variety. Days started/ended at reasonable times and there were good length breaks. I left very impressed and have some new ideas to consider which is the best way to end a conference. I’d recommend everyone attend next year.