• More Effective Standups

    Many engineers dislike standups. They often consider standups a waste of time and want to eliminate them. That is often due to how the standups are run. Usually, everyone goes around answering the following questions:

    1. What did I do yesterday?
    2. What am I doing today?
    3. What am I blocked on?

    The problem is that these questions often feel like a daily status report and don’t give the engineers any real value. The way they see it is if they are working together on something, then they’re already talking and know what the other engineers are doing. Hearing it again in standup isn’t useful. However, if they aren’t working together on something, then it is useless information that doesn’t help.

    Thankfully, we have a fantastic tool at our disposal to see what is being worked, by who, and what state that work is at: Kanban boards. A well-designed Kanban board should have a way to make it clear what phase work is at without having to go ask the engineer.

    The real value of standups is that it provides a chance for the team to collaborate and work through problems, not get a status update.

    So instead of asking the standard questions, my team has switched to asking these questions:

    1. What work is blocked? Notice the emphasis is on the work and not on the person. If anything is blocked this is an opportunity to escalate it and chat about it as a team.
    2. What work is at risk of becoming blocked? Are there other priorities that are taking us away from completing something? Was there an unknown dependency?
    3. Is there work being done that isn’t on the board? Was there unplanned work that needed to be done immediately that we didn’t account for? Let’s capture it by creating a ticket and getting it on the board.
    4. Are there any other important updates, call-outs, or risks people want to mention? Is there anything else that didn’t come up in standup that people want to mention?

    These questions are from the book Making Work Visible by Dominica DeGrandis.

    The great thing about these questions is they focus the team is on solving problems and not getting status updates. It means we quickly dive into solving problems in standup rather than waiting on everyone’s updates. And if things are going well, then standup can be super short and everyone can get back to work.

    Experiment with your team’s standups. Is the current format really helping the team solve problems or has it become a routine that has lost its value? Teams are always changing, so what worked last year might no longer be working.

  • Deconstruct 2019 - Day Two Recap

    Setting Up to Fail by Vaidehi Joshi

    This talk was about the ways distributed systems can fail of which there are many. She loves learning and teaching others as a way to better understand concepts so she created BaseDS to teach distributed systems. Previously she created BaseCS to help others understand the basics of computer science.

    Even basic web development with a single server and single client is a distributed system which means the vast majority of developers are working on distributed systems these days.

    Failure modes = types of failures a system can experience

    Lots of types but talk focused on two: timing and value response.

    #1 Timing

    The expected time interval, e.g. 5-5000ms - means the amount of time it should take for a node in the system to respond.

    Nodes being too slow or too fast can be a problem

    Ommision failure - node never response

    Subtypes are “send omission” and “receive omission”

    #2 Value Response Failure

    Response from node is incorrect.

    State transition failure

    Take away: Think about the types of failure modes your system will have and how the system should respond.


    The Tortoise and the Hare Write Software by Erica Gomez

    The software is obsessed with moving fast. Too fast in many cases.

    Erica has worked on airplane systems where there is intentional friction and development moves slow out of necessity. But in many other industries, software engineers focus too much at moving fast at the expense of bugs and major problems that our users have to endure.

    Random fact: airplanes have ~100,000 sensors on a place. Can predict future failures so parts can be replaced before there is a problem.

    The software industry has seen many incredible advancements over the decades and improved tools. E.g. AWS lets teams more super fast.

    Decades ago software engineers used punch cards which they’d have to schedule to be run or sometimes mail off to be processed and sent back with the result. The process could take weeks or months. This forced software engineers to slow down and be careful.

    Easy rollback means we can be reckless.

    This impacts engineers, not just end users. E.g. getting paged at night, dealing with broken systems.

    Very little incentive for us to slow down.

    Look for ways to add intentional friction and strike a balance between moving too slow and too fast. For example, introducing automated deployment tests slows down deploys but help us catch bugs.

    Crunch time for engineering teams (moving too fast) has been shown to increase defect rate. Resting, getting away from the computer, going on vacation and help lower the defect rate.

    Calculate the risk of moving too fast. What is the impact of defects?

    Take away: I have had similar thoughts that engineering teams often move too fast which results in lots of problems. It’d be interesting to see some stats around this talk. But ultimately the decision to move fast or slow is not up to engineering teams but dependent on goals set by the business. They can sometimes push back but often that isn’t feasible.


    Files by Dan Luu

    This talk was all about how broken file APIs, file systems, and hard drives are. There are tons of ways for them to fail.

    People think reading/writing files is very simple but it is way more complex than they realize.

    Reading/writing files is a very brittle process. Even the lowest level API like pwrite have their problems. Engineers treat these APIs as atomic when they aren’t and can fail in unexpected ways. To write a file takes multiple steps, any of which can fail.

    These problems are most obvious to big companies like Facebook, Google, and Amazon since they have millions servers and hard drives. But remember there are billions of consumer machines that are experiencing the same issues.

    Disks and SSDs have much higher error rates than is advertised.

    Modularity by Kamal Marhubi

    The usual advice is often misapplied:

    • DRY
    • Inject dependencies
    • Reduce coupling

    Much of this advice has a long history that has been forgotten and programmers don’t have the context. Also, things like “coupling” aren’t even well defined or understood.

    For example, in the paper “Go To Statement Considered Harmful” by Edgar Dijkstra, the context and motivation are often not understood by engineers today. Primary motivation was to keep the call stack easy to follow. Go To would disrupt the call stack and make it difficult to debug problems.

    Rather than just regurgitate “best practices” like “keeping code DRY”, make the goal to make code that is easy to understand and modify.

    How?

    • Reduce connections
    • Use things like interfaces to have multiple interfaces
    • Use language features like data hiding

    Note when a change is difficult and make TODO to improve. Consider creating categories of TODOs to make it easy to know.

    Go beyond style in code reviews.

    Ask for clarifications in PRs, get those turned into comments.

    Look for hidden assumptions like getThings()[0]. What if there are multiple items? Is there a special order? What if null is returned instead of array?

    What We Can Learn From Software History by Hillel Wayne

    Engineers don’t know the history or have the context from engineers from decades ago for the decisions they made. Over time, new reasons are created and don’t align with the historical reasons.

    For example, take the common interview question, implementing a linked list. How many engineers have to implement a linked list these days? A quick poll of the audience showed it isn’t common. Interviewers will give reasons like seeing how the person thinks on their feet but that doesn’t match up to the history of the question.

    The speaker did some research into interviewing questions from decades ago. That question grew in popularity when C programming was extremely common. If you programmed C you’d have to build linked lists all the time. It’d be an easy task. So back then if the question was asked, the interviewer simply wanted the interviewee to demonstrate a little knowledge of C.

    But demonstrating a basic knowledge of C isn’t applicable to most programming jobs so that reason isn’t usually given for asking the linked list question.


    Final Thoughts

    Deconstruct was one of the best conferences I’ve attended. The talk quality was high, there were no sponsored talks (and no sponsors at all) and they were all a good length and had a good variety. Days started/ended at reasonable times and there were good length breaks. I left very impressed and have some new ideas to consider which is the best way to end a conference. I’d recommend everyone attend next year.

  • Deconstruct 2019 - Day One Recap

    I’ve had the pleasure of attending the Deconstruct conference in Seattle. The first day has just wrapped up and these are a recap of the talks.


    2 Factor, 4 Humans by Karla Burnett

    Karla spoke on the various forms of two factor authentication and the problems each have.

    Account Takeovers are a very common problem. People reuse passwords across services. When one service is compromised, accounts on other services can be also be easily compromised. Password managers are difficult for non-technical people to use.

    2FA has significant usability problems. Too complex for many people to use successfully and it can deter people from using a service.

    There are four types of 2FA:

    #1 Time Based

    Examples: RSA SecurID, Google Authenticator

    Secret + Current Time = six digit code

    #2 SMS

    Google introduced this method since at the time not everyone had smart phone.

    Vulnerable from hackers switching phone number to new device. This is what caused the large iCloud hack a few years ago that targeted celebrities.

    #3 Mobile App

    Example: Duo

    Users can still be phished. User might login to site that looks real but is a “fake” site controlled by hackers. User enters password which hacker forwards to true service and triggers push notification which user approves without a second though.

    #4 Security Key

    Example: Yubikey

    Secret + Domain + Nonce is used to authenticate. Including domain means a “fake” site like in #3 couldn’t intercept the user’s credentials.

    Expensive. Not feasible for the mass population.

    Very U.S. centric. Security keys can be very difficult to acquire in other countries.

    All 2FA methods have problems. They hurt usability and frustrates users.

    Think about usability when adding 2FA and deciding which options to include.

    Do threat modeling for your service and users. Consider what your service does and the risk involved. Is there PII, banking information, etc?

    For many services, SMS 2FA sufficiently reduces risk without introducing too much friction for users.


    Multiplayer Game Networking by Ayla Myes

    Ayla gave an incredible talk on her journey to build a multiplayer game and understand the networking behind it.

    Ayla Myes presenting Mario Brothers like image she made

    She walked us through three techniques she tried in building a multiplayer game and finally succeeded by applying all three.

    The slides she created were particularly fun and creative.

    Techniques she tried. Imagine a shooting game.

    #1 Clients send the location of bullets they fire.

    Clients can interpret the locations different. Client #1 thinks they hit something but client #2 thinks they dodged.

    #2 Server runs a copy of the game and clients just send inputs (e.g. user presses “jump” button)

    Latency is the biggest problem. User hits button to jump which is sent to server. Server responds with results. But networks are slow and there could be 500 milliseconds before response is received. This results in delay for user. E.g. jump button pressed….wait….then you see character jump on screen.

    #3 Client can predict future state

    Client and server both run copies of the game.

    Server has to confirm decisions by clients and can overrule.

    Server has to decide when two different clients predict different results. Very complicated! Server has to rewind the actions of clients to figure out the solution.

    PICO-8 is a “fantasy” console where you can build your own games.


    Jepsen 11: Once More Unto The Breach by Kyle Kingsbury

    Jepsen is a library used to verify claims that vendors make of the databases, queues, etc. Kyle walked us through three databases and the significant problems found in each one.

    FaunaDB, YugabyteDB, and TiDB.

    Jepsen works by doing many operations in a DB such as inserting random values and reading them back out. It will introduce faults to the DB like simulating power failure of a node. Jepsen has been used to find bugs like simulated banks accounts returning incorrect.

    Vendor claims have often found to be wrong.

    Recommendations:

    • Read the DB docs very carefully!
    • Docs can be wrong so verify the claims.
    • Consider failure modes – e.g. what happens if power is lost
    • No perfect DB

    Identifying Mushrooms Like a Prolog by Josh Cox

    Josh’s hobby is mushroom hunting. He also likes to learn new programming languages and finds them all unique. He used Prolog to help him identify mushrooms.

    There are many Prolog variants but he is using SWI Prolog.

    Prolog is a logic programming language, far different from most languages like Javascript.

    It has a concept of facts and rules.

    I’ve never used Prolog but it appears to be somewhere between and database and rules engine.

    After you define the facts and rules you can query Prolog and it’ll work out the logic of how to respond.

    Josh encoded multiple facts about mushrooms. For example, a mushroom might be represented by the facts cap shape, spore color, gill type, etc.


    Please Inline This Abstraction by Dan Abramov

    This was a very practical talk about the dangers of abstractions. Code duplication is often the better approach due to its improved maintainability. Eventually, an abstraction may be better we need to weigh the pros and cons and ensure we don’t make a bigger problem down the road.

    Problems with abstractions:

    Fix for bugs in abstractions means you need to verify it doesn’t introduce bugs in any of the consumers of the abstraction.

    From spaghetti code the “lasagna code” - meaning you end up with many layers in your code that is hard to follow and understand.

    Complex abstractions have inertia. As they grow and get more complex and get used more, it takes more and more time/effort to unwind and remove.

    Recommendations

    If you have an abstraction, don’t just unit test it, but test where the value is. E.g. if you have an abstraction used in many React components, test its use in each React components. Will help you identify when changes in abstraction introduce bugs in components.

    Basically, this boils down to doing integration testing rather than just pure unit tests

    Delay adding abstractions until really necessary.


    A Personal Computer for Children of All Cultures by Ramsey Nasser

    To learn programming you must already be proficient in English. This means it excludes many peoples from different cultures.

    Ramsey created a programming language in Arabic to explore the idea of programming languages in other languages. It is limited in that it cannot consume libraries written in English.

    Unicode enables languages to use any character, including words non-English languages, so that you can give functions, variables, etc. names in your native language.

    But this doesn’t solve the problem of language keywords like if, for, function, etc.

    Also, the entire computer stack is written in English. Even at the lowest levels in an OS you’ll find English.

    Names have cultural and political implications. For example, there are many cities named Alexandria because they were conquered by Alexander the Great.

    One possible solution is to decouple the programming constructs from the names we give them. Take git for example. It has hashes assigned for each change. A programming language could follow that pattern. It could also provide a dictionary, separate from the program and provides names to the constructs. That dictionary could contain names in many different languages that the software developer could choose from.

    This is preferred over translation from English to another language because English will always be the source of truth.

    The problem is that doing something like this is impossible for existing languages and requires entirely new languages that likely are incompatible with existing languages and tools.

    But being more culturally sensitive is likely not enough to increase adoption of this hypothetical language. However, if that language could provide other benefits, it might be enough to encourage adoption.


    Clock Skew and You by Allison Kaptur

    Clock skew is very common in distributed systems. Most software developers are working on distributed systems without realizing it. E.g. web app with database is a distributed system.

    Clocks are not good at staying accurate. Things like temperate, leap seconds, and more cause problems.

    Clocks can be synced but networks cause problems because they are a delay. There are algorithms that allow the network delay to be calculated and taken into account.

    True Chimer is a time server that is accurate.

    False Ticker is a time server that is inaccurate.

    Client times cannot be trusted. Always have a reasonable feedback if it is wrong. For example, if a UI shows a relative time message like “3 hours ago” based on the client time. What would happen if the client’s computer was five hours behind. The UI should not show “2 hours from now”. Maybe that means just omitting the message.

    Pick a timekeeper server. DB is usually a great option.


    Voice Driven Development by Emily Shea

    Emily has really bad RSI and decided to try programming by voice.

    This was really impressive. She drove the presentation entirely by voice. E.g. “next slide”, “start demo”

    Writes mostly Perl. 🤮

    She tried dozens of possible solutions for her RSI.

    She uses a combination of Dragon Dictation and Talon. Talon is made by a software developer with RSI. Tailor-made for programming.

    Problems with voice driven programming:

    Tools with poor accessibility.

    Voice strain. Some people have had to get voice coaches to ensure they are speaking properly so they don’t injure themselves talking 8 hours a day.

    Open offices. Working from home is a great alternative if company supports it. Also acoustic sounds booths. There are problems from you disturbing your teammates but also from their talking affecting the computer’s ability to understand you.

    You could also use a stenomask if you want to look like Darth Vader.

    Emily has configured Talon to help her be quick with her programming. Very configurable for different languages and use cases.