Database Indexing Strategies for Developers
The Query That Changed Everything
My relationship with database indexes shifted the day a junior engineer asked me why their query was slow. I confidently said it needed an index, added one, and the query got slower. Not because the index was wrong — it was exactly the right index for the query — but because the table had nine other indexes that the optimizer was now weighing against each other, and the statistics it used to choose had gone stale. That day taught me that indexing is not a checklist; it is a model of how your database reasons about your data, and the model changes as the data does.
Most developers treat indexes as a tax paid on writes to speed up reads. That framing is not wrong, but it is incomplete in ways that matter. Indexes are a form of pre-computation. Each one encodes a bet about which queries will be asked often enough to justify the computation cost at write time. The wrong bet is not merely wasted storage; it is actively harmful, because every additional index gives the optimizer more candidates to evaluate and more opportunities to choose poorly.
Reading an EXPLAIN Plan Without Fear
The EXPLAIN plan is the diagnostic tool that separates developers who tune databases from those who guess. Reading one is a skill, and like any skill, it rewards deliberate practice. The plan tells you what the database decided to do, not what you wanted it to do, and the gap between the two is where the performance lives.
- Seq Scan — the database is reading every row. Sometimes this is correct (small tables, bulk operations); often it means a missing or unused index.
- Index Scan — the database is using an index to find rows. Good, but check whether it then fetches the row data (a heap fetch), which can be expensive.
- Index Only Scan — the database found everything it needed in the index without touching the table. The holy grail; achievable when all queried columns are in the index.
- Nested Loop vs Hash Join — tells you how the database is joining tables. Hash joins are typically faster for large inputs; nested loops win when one side is small.
The Compound Index Insight
The single most impactful indexing concept for developers is the compound index, and specifically the column ordering within it. A compound index on (last_name, first_name) can serve queries that filter on last_name alone, or on both columns, but not queries that filter on first_name alone. The index is usable left to right, and this property — called the leftmost prefix rule — determines which queries benefit and which do not.
An index you do not use is not neutral; it is a liability. It costs write performance, storage, and optimizer consideration at every query. The best index is the one you remove, not the one you add.
When Indexes Are the Wrong Answer
Indexing discipline also means knowing when an index will not help. If a query filters on a column with low cardinality — a boolean flag, a status with three values — the index is nearly useless because it does not partition the data meaningfully. The optimizer will correctly ignore it and do a sequential scan, and adding the index just imposes write cost for zero read benefit.
The deeper lesson is that performance is a property of the whole system, not of any single component. A slow query might need an index, or it might need a different schema, or it might need caching, or it might need a different user experience that does not ask the question at all. The developer who reaches for an index reflexively is like the carpenter who reaches for a hammer regardless of the fastener. Indexes are one tool in a larger toolkit, and the skilled practitioner knows which tool the job actually requires. Curiosity about the underlying model — about how the database reasons — is what separates adequate performance work from excellent performance work.
Khaldoun Senjab
A software developer, CS researcher, and academic at the University of Sharjah with over 20 years of experience spanning software engineering, cloud computing, and artificial intelligence. Passionate about building systems that bridge the gap between academic research and real-world impact.
Related Posts
Building Distributed Systems: Lessons From Production
Hard-won insights from running distributed systems at scale — the failure modes, the trade-offs, and the patterns that actually work.
Machine Learning Model Deployment Patterns
A practical guide to deploying ML models in production — from batch scoring to real-time inference, with infrastructure that scales.
Adaptive Resource Orchestration in Cloud-Native Systems
How machine learning can drive dynamic resource allocation in cloud environments, reducing costs while maintaining performance SLOs.