Danube Data - Data, on the right course

A query that takes 5 seconds can often be reduced to 50 milliseconds with the right index. Yet database indexing remains one of the most misunderstood concepts in web development.

This guide explains how indexes work, when to use them, and common mistakes to avoid—with practical examples for PostgreSQL and MySQL.

What is a Database Index?

A database index is like a book's index. Instead of reading every page to find "PostgreSQL", you flip to the back, find "PostgreSQL: pages 45, 122, 203", and go directly there.

Without an index, the database performs a sequential scan—reading every row in the table to find matches. With an index, it performs an index scan—looking up the index to find exactly which rows to read.

Visual Example

# Without index (sequential scan)
Table: 1,000,000 rows
Query: SELECT * FROM users WHERE email = 'john@example.com'
Operation: Read all 1,000,000 rows, check each email
Time: ~2-5 seconds

# With index on email column
Table: 1,000,000 rows
Query: SELECT * FROM users WHERE email = 'john@example.com'
Operation: Look up 'john@example.com' in index → row #45,231 → read that row
Time: ~5-50 milliseconds

How B-Tree Indexes Work

The most common index type is a B-tree (balanced tree). Here's how it works:

# B-Tree structure for 'email' column (simplified)

                    [M]
                   /   
              [G]       [S]
             /        /   
         [A-F] [H-L] [N-R] [T-Z]
            ↓     ↓     ↓     ↓
          rows  rows  rows  rows

# Searching for 'john@example.com':
1. Start at root: 'j' < 'm' → go left
2. At [G]: 'j' > 'g' → go right
3. At [H-L]: 'j' is here → find exact entry
4. Entry points to row #45,231
5. Read row #45,231

# Instead of 1,000,000 comparisons, we did ~4

B-trees have O(log n) lookup time. For a table with 1 million rows, that's about 20 comparisons maximum instead of 1 million.

Creating Indexes

Single Column Index

-- PostgreSQL & MySQL
CREATE INDEX idx_users_email ON users(email);

-- Check if index is used
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';

Unique Index

-- Enforces uniqueness AND speeds up lookups
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- In most cases, use a unique constraint instead (creates index automatically)
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);

Composite Index (Multiple Columns)

-- Index on multiple columns
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- This index helps these queries:
SELECT * FROM orders WHERE user_id = 123;                      -- ✅ Uses index
SELECT * FROM orders WHERE user_id = 123 AND created_at > '2024-01-01'; -- ✅ Uses index
SELECT * FROM orders WHERE created_at > '2024-01-01';          -- ❌ Cannot use index!

-- Column order matters! The index is like a phone book sorted by (last_name, first_name)
-- You can't look up by first_name alone efficiently

Partial Index (PostgreSQL)

-- Index only rows matching a condition
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';

-- Smaller index, faster for queries on pending orders
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2024-01-01';

Expression Index

-- PostgreSQL: Index on expression result
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Now this query uses the index:
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- MySQL equivalent
CREATE INDEX idx_users_lower_email ON users((LOWER(email)));

When to Add Indexes

Always Index

Primary keys: Automatic in most databases
Foreign keys: Critical for JOIN performance
Columns in WHERE clauses: Frequently filtered columns
Columns in JOIN conditions: Both sides of JOINs
Columns in ORDER BY: If you need sorted results

Consider Indexing

Columns with high selectivity: Many unique values (email, username)
Columns used in range queries: dates, prices, quantities
Columns in GROUP BY: Can speed up aggregations

Avoid Indexing

Low selectivity columns: status (5 values), gender (3 values), is_active (2 values)
Frequently updated columns: Index maintenance costs
Small tables: Sequential scan might be faster than index lookup
Columns rarely used in queries: Wasted space

Understanding EXPLAIN

Use EXPLAIN to see if your indexes are being used:

PostgreSQL EXPLAIN

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';

-- Good output (using index):
Index Scan using idx_users_email on users
  Index Cond: (email = 'john@example.com'::text)
  Actual time: 0.025..0.026 rows=1 loops=1

-- Bad output (not using index):
Seq Scan on users
  Filter: (email = 'john@example.com'::text)
  Actual time: 2345.123..2345.456 rows=1 loops=1

MySQL EXPLAIN

EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

-- Good: key column shows index name
+----+-------+------+---------------+------+---------+------+------+-------+
| id | type  | key  | key_len       | ref  | rows    | Extra          |
+----+-------+------+---------------+------+---------+------+------+-------+
|  1 | ref   | idx_users_email | 767 | const | 1 | Using index condition |
+----+-------+------+---------------+------+---------+------+------+-------+

-- Bad: key is NULL, type is ALL
+----+-------+------+------+------+---------+----------+
| id | type  | key  | ref  | rows    | Extra            |
+----+-------+------+------+------+---------+----------+
|  1 | ALL   | NULL | NULL | 1000000 | Using where      |
+----+-------+------+------+------+---------+----------+

Composite Index Strategy

The order of columns in a composite index is crucial:

The Left-Prefix Rule

-- Index on (a, b, c)
CREATE INDEX idx_abc ON table(a, b, c);

-- Can use index:
WHERE a = 1                    -- ✅ Uses (a)
WHERE a = 1 AND b = 2          -- ✅ Uses (a, b)
WHERE a = 1 AND b = 2 AND c = 3 -- ✅ Uses (a, b, c)
WHERE a = 1 AND c = 3          -- ✅ Uses (a), filters c

-- Cannot use index:
WHERE b = 2                    -- ❌ Skipped 'a'
WHERE c = 3                    -- ❌ Skipped 'a' and 'b'
WHERE b = 2 AND c = 3          -- ❌ Skipped 'a'

Ordering Strategy

-- Put columns in this order:
1. Equality conditions first (WHERE status = 'active')
2. Range conditions last (WHERE created_at > '2024-01-01')
3. Consider selectivity (more selective = earlier)

-- Example: Finding active users created this year
-- Option 1: (status, created_at)
CREATE INDEX idx_users_status_date ON users(status, created_at);

-- Option 2: (created_at, status) - WORSE
-- Range on created_at breaks index usage for status

-- Best query plan:
SELECT * FROM users
WHERE status = 'active' AND created_at > '2024-01-01';
-- Uses full index (status, created_at)

Covering Indexes

A covering index includes all columns needed by a query, avoiding table lookups entirely:

-- Query: Find user emails and names
SELECT email, name FROM users WHERE status = 'active';

-- Regular index (requires table lookup)
CREATE INDEX idx_users_status ON users(status);
-- Process: Find row IDs in index → fetch rows from table → return email, name

-- Covering index (no table lookup needed)
CREATE INDEX idx_users_status_covering ON users(status) INCLUDE (email, name);
-- PostgreSQL only

-- MySQL alternative: include columns in index itself
CREATE INDEX idx_users_status_email_name ON users(status, email, name);
-- Process: All data is in the index → return directly

-- Result: 2-5x faster for this specific query

Index Types Beyond B-Tree

Hash Index

-- PostgreSQL
CREATE INDEX idx_users_email_hash ON users USING hash(email);

-- Only for equality comparisons (=)
-- Faster than B-tree for exact matches
-- Cannot do: <, >, BETWEEN, LIKE

GIN Index (PostgreSQL)

-- For array, JSONB, full-text search
CREATE INDEX idx_posts_tags ON posts USING gin(tags);
CREATE INDEX idx_users_data ON users USING gin(metadata jsonb_path_ops);

-- Enables efficient queries like:
SELECT * FROM posts WHERE tags @> ARRAY['postgresql'];
SELECT * FROM users WHERE metadata @> '{"premium": true}';

Full-Text Index

-- PostgreSQL
CREATE INDEX idx_posts_search ON posts USING gin(to_tsvector('english', title || ' ' || body));

-- MySQL
CREATE FULLTEXT INDEX idx_posts_fulltext ON posts(title, body);

Common Indexing Mistakes

Mistake 1: Indexing Low-Selectivity Columns

-- Bad: Only 3 possible values
CREATE INDEX idx_users_status ON users(status);
-- status: 'active', 'inactive', 'pending'

-- The index returns 33% of rows on average
-- Sequential scan might be faster!

-- Better: Partial index if you only query one status
CREATE INDEX idx_users_active ON users(id) WHERE status = 'active';

Mistake 2: Not Indexing Foreign Keys

-- Table structure
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),  -- Foreign key
    product_id INTEGER REFERENCES products(id)
);

-- CRITICAL: Always index foreign keys!
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);

-- Without these, JOINs and cascading deletes are extremely slow

Mistake 3: Over-Indexing

-- Too many indexes:
CREATE INDEX idx_1 ON users(email);
CREATE INDEX idx_2 ON users(email, name);
CREATE INDEX idx_3 ON users(email, created_at);
CREATE INDEX idx_4 ON users(name, email);  -- Overlaps with idx_2!

-- Problems:
-- - Every INSERT/UPDATE/DELETE must update all indexes
-- - Wasted disk space
-- - Query planner confusion

-- Solution: Create indexes based on actual query patterns
-- Use EXPLAIN on your slow queries, then create targeted indexes

Mistake 4: Functions Preventing Index Usage

-- Index exists
CREATE INDEX idx_users_email ON users(email);

-- These queries CANNOT use the index:
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';  -- ❌
SELECT * FROM users WHERE email LIKE '%@example.com';        -- ❌ (leading wildcard)
SELECT * FROM users WHERE YEAR(created_at) = 2024;            -- ❌

-- Solutions:
-- 1. Expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- 2. Store computed values
ALTER TABLE users ADD COLUMN email_domain VARCHAR(255);
CREATE INDEX idx_users_email_domain ON users(email_domain);

-- 3. Rewrite query
SELECT * FROM users WHERE email LIKE 'john%';  -- ✅ (no leading wildcard)

Mistake 5: Wrong Column Order in Composite Index

-- Query pattern
SELECT * FROM orders
WHERE user_id = 123
AND created_at BETWEEN '2024-01-01' AND '2024-12-31';

-- Bad index order
CREATE INDEX idx_orders_date_user ON orders(created_at, user_id);
-- Range condition on created_at breaks user_id usage

-- Good index order
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Equality on user_id first, then range on created_at

Real-World Example: E-Commerce Query Optimization

Let's optimize a typical e-commerce query:

-- Original slow query (5+ seconds)
SELECT
    o.id, o.total, o.created_at,
    u.name, u.email
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'completed'
AND o.created_at >= '2024-01-01'
AND o.total > 100
ORDER BY o.created_at DESC
LIMIT 50;

-- Step 1: Analyze with EXPLAIN
EXPLAIN ANALYZE [query];
-- Shows: Seq Scan on orders, 2.3 million rows examined

-- Step 2: Create composite index
-- Equality first (status), then range (created_at)
-- Include columns for covering index
CREATE INDEX idx_orders_status_date ON orders(status, created_at DESC)
INCLUDE (user_id, total);

-- Step 3: Ensure foreign key is indexed
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Step 4: Re-run EXPLAIN
EXPLAIN ANALYZE [query];
-- Shows: Index Scan using idx_orders_status_date
-- Time: 5000ms → 15ms (333x faster!)

Monitoring Index Usage

PostgreSQL: Find Unused Indexes

SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS size,
    idx_scan AS times_used
FROM pg_stat_user_indexes i
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0
AND NOT indisunique
ORDER BY pg_relation_size(i.indexrelid) DESC;

PostgreSQL: Find Missing Indexes

SELECT
    schemaname || '.' || relname AS table,
    seq_scan,
    seq_tup_read,
    idx_scan,
    seq_tup_read / seq_scan AS avg_rows_per_scan
FROM pg_stat_user_tables
WHERE seq_scan > 0
AND seq_tup_read / seq_scan > 1000  -- Reading >1000 rows per scan
ORDER BY seq_tup_read DESC;

MySQL: Analyze Index Usage

-- Check index statistics
SELECT
    TABLE_NAME,
    INDEX_NAME,
    CARDINALITY
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = 'your_database'
ORDER BY TABLE_NAME, INDEX_NAME;

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- Log queries > 1 second

Index Maintenance

PostgreSQL: Reindex

-- Reindex single index
REINDEX INDEX idx_users_email;

-- Reindex entire table
REINDEX TABLE users;

-- Reindex concurrently (no locks)
REINDEX INDEX CONCURRENTLY idx_users_email;

MySQL: Optimize

-- Rebuild table and indexes
OPTIMIZE TABLE users;

-- Analyze table (update statistics)
ANALYZE TABLE users;

Key Takeaways

Always index foreign keys - JOINs will thank you
Use EXPLAIN - Don't guess, measure
Column order matters - Equality first, ranges last
Don't over-index - Each index has write costs
Monitor usage - Remove unused indexes
Consider covering indexes - For read-heavy queries

Need a Fast Database?

All the indexing in the world won't help if your database server is slow. DanubeData managed databases run on NVMe storage for maximum I/O performance.

👉 Create a Managed Database - Starting at €19.99/mo

DanubeData Database Features:

NVMe storage for fast index scans
Automatic daily backups
PostgreSQL 15-17, MySQL 8.0-9.1
German data center (low latency for EU)

Need help optimizing your database? Contact our team for performance tuning assistance.

Get Started Free

Compute

Managed Apps

Caching

Databases

Storage

Developers

Developer Docs

Learn

Support

Database Indexing Explained: Speed Up Your Queries 10x