Technical Interviews Reject the Wrong Engineers

F fagnerbrack.com ↗

▲ 57 points • 109 comments • by fagnerbrack • 3d ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is a mix of AI-generated, and human-written content

75 %

AI likelihood · overall

Mixed

25% human-written 75% AI-generated

SEGMENTS · HUMAN 1 of 5

SEGMENTS · AI 4 of 5

WORD COUNT 1,705

PEAK AI % 99% · §1

Analyzed

Jun 5

backend: pangram/v3.3

Segments scanned

5 windows

avg 341 words each

Distribution

25 / 75%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 1,705 words · 5 segments analyzed

Human AI-generated

§1 AI · 99%

20 years of observation, 50 years of research, and a framework for measuring the interview instead of the candidate12 min readMay 11, 2026--The Skill Spectrum: A representation of Dreyfus Model of Skill Acquisition applied to an expert candidate versus an Advanced Beginner interviewerMost companies treat hiring like a filter. Put candidates through enough rounds, ask enough questions, and the good ones will survive. The problem is that the filter is broken. It selects for the wrong things, rejects people it can’t evaluate, and costs more when it fails than most teams realize.I’ve spent over 15 years observing and researching technical interviews. I’ve watched brilliant engineers get rejected because they didn’t solve a problem the way the interviewer expected. I’ve watched mediocre engineers get hired because they had practiced the right LeetCode patterns. And I’ve watched companies repeat this cycle while citing a “cost of a bad hire” statistic that, as it turns out, no one can trace to an actual source.This post is about what the research says, where the common tools break down, and a framework I built to measure interview quality itself.The cost of getting it wrong is… wrong?You’ve probably seen the claim that the U.S. Department of Labor estimates a bad hire costs 30% of first-year earnings. I went looking for the original source. There isn’t one. No DOL publication, no report title, no URL. Every article cites the last in an infinite loop. The same is true of the “80% of turnover is due to bad hiring decisions” figure attributed to Harvard Business Review. No specific HBR article contains that number.This is not surprising, it's the same effect as The Learning Pyramid. A bunch of trusted sites repeating the same thing they heard somewhere else with no reference to the original impirical sources.The real research is less dramatic but more useful.The Center for American Progress reviewed 30 case studies in 2012 and found the median replacement cost across all positions is about 21% of annual salary [1]. For workers earning under $75K, that number holds. For senior roles, it climbs to 213% [1].But replacement cost is the wrong frame for engineering teams.

§2 AI · 96%

The more useful finding comes from Housman and Minor’s 2015 Harvard Business School study of 50,000 workers across 11 firms [2]. They found that avoiding a toxic worker generates roughly twice the return of hiring a star performer. A toxic worker costs about $12,489 in direct replacement. A top-1% performer adds about $5,303 in value [2]. And toxic behavior spreads. When one joins a team, peers become more likely to behave the same way.The biggest hiring risk is not missing a great candidate. It is letting a destructive one through.The reason this matters: most interview processes are designed to find talent. Few are designed to detect toxicity, and the two goals require different signals.Whiteboard interviews test whether a candidate can perform under observation while solving a problem they’d normally Google. A 2020 study by Behroozi et al. found that candidates given traditional whiteboard interviews with an observer performed at half the level of those who solved the same problems privately [3]. All women in the public condition failed. All women in the private condition passed [3].This is not a talent filter. It is an anxiety filter.Pair programming interviews are better, but they carry their own distortion. Pair programming was designed as a collaborative practice for producing code, not for evaluating a stranger’s skill under time pressure. When I pair with a colleague on my team, we share context, vocabulary, and trust. An interview has none of those. The candidate is performing while being watched by someone who holds power over their career. Calling that pair programming is like calling a job interview a conversation.The deeper issue is tacit knowledge. Most of what a skilled engineer knows is not something they can articulate on demand. They recognize patterns. They sense when a design will cause problems in six months. They make tradeoffs that feel obvious to them but are invisible to someone at a different skill level. Standard interviews are built to test explicit knowledge: can you explain this algorithm, can you describe this pattern, can you walk through your reasoning.The candidates who perform best are those who are good at talking about code, which is a different skill from writing it.Some companies try to add science to the process with personality assessments. The two most popular choices in tech hiring are the Myers-Briggs Type Indicator and “growth mindset” screening.

§3 Human · 24%

The research on both is clear.The MBTI’s own publisher says using it for hiring is unethical [4]. That’s not a critic talking. That’s The Myers-Briggs Company, in writing, through their Senior Director of US Professional Services. The Myers & Briggs Foundation’s ethical guidelines state it directly.The reason is simple: the test doesn’t measure what it claims to measure. Pittenger (2005) found that 35% of people get a different four-letter type when retaking the test after five weeks [5]. The National Academy of Sciences reviewed 20+ MBTI studies and concluded there is not enough evidence to justify its use [6]. Its predictive validity for job performance is about r = .10-.20, roughly the same as flipping a coin with a slight thumb on the scale.Growth mindset fares no better. The largest meta-analysis (Sisk et al., 2018, covering 365,915 participants) found the correlation between mindset and achievement is r = .10, explaining about 1% of variance [7]. When Macnamara and Burgoyne (2023) restricted the analysis to the six highest-quality studies, the effect dropped to d = 0.02 [8]. Not small. Negligible. And researchers with financial ties to mindset interventions reported significantly larger effects than independent researchers did [8].There is no peer-reviewed evidence that growth mindset predicts job performance in any workplace setting. Asking about it in interviews measures nothing useful.Using Myers Briggs for hiring is unethical; using Carol Dweck's Growth Mindset measures nothing useful.What actually works? The Big Five.The Big Five personality model (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) has over three decades of meta-analytic support. Barrick and Mount (1991) established that conscientiousness predicts job performance across occupations [9]. Wilmot and Ones (2019) confirmed this across 1.1 million participants [10].For software engineering, the picture has a useful nuance. Conscientiousness is a weaker predictor in high-complexity work [10].

§4 AI · 98%

Gnambs (2015) found that openness to experience and conscientiousness both predicted programming aptitude, and that openness has become more important over time as software work requires more creativity [11]. Introversion also correlated with programming skill [11].There’s an honest caveat here. Even the best personality measures explain 4–6% of performance variance. They’re susceptible to faking. And extremely high conscientiousness can be counterproductive, producing rigid, perfectionistic engineers [12]. The Big Five should be one input in a multi-method process, not a standalone gate.Seniority levels don’t work. Skill-specific assessment does.Most companies assign a single seniority level to an engineer: junior, mid, senior, staff. This flattens a complicated reality. A senior engineer might be expert-level at API design and novice-level at frontend performance optimization. Labeling them “senior” tells you nothing about which problems they can solve.The Dreyfus model of skill acquisition, published in 1980 by Stuart and Hubert Dreyfus, describes five stages: Novice, Advanced Beginner, Competent, Proficient, and Expert [13]. The stages differ not in how much someone knows, but in how they think.At the Competent stage, typically reached after 2–3 years, practitioners break problems into components, apply rules, and build solutions step by step [13]. They value explicit reasoning. They believe skill is demonstrated by showing your work.At the Expert stage, practitioners perceive situations as wholes and respond intuitively [13]. As the Dreyfus brothers described it, the expert doesn’t calculate or solve problems in the traditional sense. They draw on vast repertoires of pattern recognition, estimated at 100,000+ distinguishable situations for chess grandmasters. Neuroscience supports this: Amidzic et al. (2001) found experts and amateurs use different brain regions for the same tasks [14].This creates a specific failure mode in interviews that I’ve seen play out dozens of times. A competent-level interviewer asks “walk me through your reasoning.” A genuine expert gives a sparse answer, not because they lack depth, but because their cognition doesn’t work through explicit rule-following. The interviewer marks this as shallow thinking. It is the opposite.

§5 AI · 98%

Andy Hunt captured this in Pragmatic Thinking and Learning: experts can be amazingly intuitive but completely inarticulate about how they arrived at a conclusion [15]. They genuinely don’t know. It just felt right. A competent interviewer hears “it felt right” and writes “no clear reasoning process” in their feedback.The interview didn’t fail the candidate. The interviewer’s model of what skill looks like failed.The Dreyfus mismatch is one half of the problem. Ego is the other.Similarity bias is well-documented in hiring research. Rand and Wexley (1975) showed that biographical similarity between interviewer and candidate produces higher ratings regardless of qualifications [16]. Rivera (2012) found that more than half of hiring professionals at elite firms ranked “cultural fit” as the most important criterion, and that interviewers used themselves as models for the ideal candidate [17].Get Fayner Brack’s stories in your inboxJoin Medium for free to get updates from this writer.Remember me for faster sign inIn technical interviews, this plays out as: “they didn’t solve it the way I would.” The interviewer has a preferred approach. The candidate uses a different one. The candidate’s approach might be better, but the interviewer can’t evaluate what they don’t recognize.This connects directly to the Dreyfus model. A competent-level interviewer evaluates through rules. When a proficient or expert candidate bypasses those rules with pattern-matched intuition, the interviewer doesn’t see mastery. They see someone who skipped steps. And because the interviewer can’t distinguish “skipped steps due to incompetence” from “skipped steps due to operating at a higher cognitive level,” they default to the interpretation that protects their ego.The Dunning-Kruger effect makes this worse [18]. Participants in the bottom quartile of skill estimated their performance at the 62nd percentile. The incompetent lack both skill and the ability to recognize their incompetence. An interviewer who is mediocre at system design may genuinely lack the ability to distinguish a good answer from a great one, while being fully confident in their evaluation.Confirmation bias locks the whole cycle in place. Research shows 60% of interviewers make their decision within 15 minutes [19].