SQL (Structured Query Language) is the standard language for managing and querying relational databases. These questions cover core syntax, joins, aggregation, indexing, transactions, window functions, query optimization, and advanced concepts frequently tested in data engineer, backend, and analyst interviews.
WHERE filters rows before aggregation and cannot reference aggregate functions. HAVING filters groups after a GROUP BY aggregation, and it can reference aggregate functions like COUNT() or SUM().
INNER JOIN returns only matching rows from both tables. LEFT JOIN returns all rows from the left table and matching rows from the right (NULLs for non-matches). RIGHT JOIN is the mirror of LEFT JOIN. FULL OUTER JOIN returns all rows from both tables with NULLs where there is no match. CROSS JOIN returns the Cartesian product of both tables.
DELETE removes specific rows and is logged and transactional; it can be rolled back. TRUNCATE removes all rows quickly without row-by-row logging and cannot be rolled back in most databases. DROP removes the entire table structure and its data permanently.
A PRIMARY KEY uniquely identifies each row in a table, does not allow NULLs, and only one can exist per table. A UNIQUE constraint also enforces uniqueness but allows one NULL value (in most databases) and multiple UNIQUE constraints can exist on a single table.
UNION combines the results of two queries and removes duplicate rows. UNION ALL combines results including duplicates and is faster because it skips the deduplication step.
Aggregate functions perform a calculation on a set of rows and return a single value. Common examples include COUNT(), SUM(), AVG(), MIN(), and MAX(). They are typically used with GROUP BY to compute metrics per group.
A subquery is a query nested inside another query, used in SELECT, FROM, or WHERE clauses. You use it when you need to filter results based on an aggregated value or derived dataset, for example finding employees whose salary is above the average salary.
A non-correlated subquery executes once independently of the outer query and its result is used by the outer query. A correlated subquery references a column from the outer query, causing it to re-execute for every row of the outer query, which can be less performant.
An index is a data structure (typically a B-tree) built on one or more columns that allows the database engine to locate rows without scanning the entire table. It dramatically speeds up SELECT queries with WHERE, JOIN, or ORDER BY on indexed columns but adds overhead to INSERT, UPDATE, and DELETE operations.
Window functions perform calculations across a set of table rows related to the current row without collapsing them into a single result like GROUP BY. For example, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) assigns a sequential rank to employees within each department by salary.
ROW_NUMBER() assigns a unique sequential integer to every row with no gaps or ties. RANK() assigns the same rank to tied rows but skips subsequent rank numbers (e.g., 1,1,3). DENSE_RANK() assigns the same rank to tied rows but does not skip rank numbers (e.g., 1,1,2).
A CTE is a temporary named result set defined with the WITH keyword that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs improve readability, allow recursive queries, and can be referenced multiple times within the same query unlike subqueries.
Normalization organizes a database to reduce redundancy and improve integrity. 1NF requires atomic column values and no repeating groups. 2NF requires 1NF plus all non-key attributes must fully depend on the entire primary key (eliminates partial dependency). 3NF requires 2NF plus no transitive dependencies — non-key attributes must depend only on the primary key.
An INNER JOIN returns only rows where there is a match in both tables, so the result set can be smaller than either input. A LEFT JOIN always returns all rows from the left table, padding with NULLs for columns from the right table where no match exists, so the result set is at least as large as the left table.
The logical order is: FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT/OFFSET. This is why column aliases defined in SELECT cannot be used in WHERE (WHERE is evaluated first), but can be used in ORDER BY.
A deadlock occurs when two or more transactions permanently block each other by each holding a lock the other needs. Prevention strategies include accessing tables in a consistent order across transactions, keeping transactions short, using appropriate isolation levels, and using lock hints or retry logic.
The four standard levels are READ UNCOMMITTED (allows dirty reads), READ COMMITTED (prevents dirty reads), REPEATABLE READ (also prevents non-repeatable reads), and SERIALIZABLE (also prevents phantom reads). Higher isolation levels provide more consistency but reduce concurrency and can increase contention.
Query optimization improves performance by reducing I/O and CPU usage. Common techniques include adding appropriate indexes, rewriting correlated subqueries as JOINs, avoiding SELECT *, using covering indexes, analyzing execution plans (EXPLAIN/EXPLAIN ANALYZE), partitioning large tables, and updating table statistics so the query planner makes better decisions.
A recursive CTE references itself using a UNION ALL between an anchor member (base case) and a recursive member. It is used to query hierarchical or tree-structured data such as organizational charts, bill of materials, or category trees without knowing the depth of the hierarchy in advance.
A clustered index determines the physical storage order of rows in a table; there can only be one per table and in SQL Server the primary key is clustered by default. A non-clustered index is a separate structure with pointers back to the table rows, and multiple non-clustered indexes can exist on a single table.
© RM Full Stack & AI Engineer · All interview questions · Roadmaps · Open the app