RMRM Full Stack & AI Engineer · All questions · Roadmaps
Databases · interview questions

SQL Interview Questions

SQL (Structured Query Language) is the standard language for managing and querying relational databases. These questions cover core syntax, joins, aggregation, indexing, transactions, window functions, query optimization, and advanced concepts frequently tested in data engineer, backend, and analyst interviews.

1. What is the difference between WHERE and HAVING?

beginner

WHERE filters rows before aggregation and cannot reference aggregate functions. HAVING filters groups after a GROUP BY aggregation, and it can reference aggregate functions like COUNT() or SUM().

2. What are the different types of JOINs in SQL?

beginner

INNER JOIN returns only matching rows from both tables. LEFT JOIN returns all rows from the left table and matching rows from the right (NULLs for non-matches). RIGHT JOIN is the mirror of LEFT JOIN. FULL OUTER JOIN returns all rows from both tables with NULLs where there is no match. CROSS JOIN returns the Cartesian product of both tables.

3. What is the difference between DELETE, TRUNCATE, and DROP?

beginner

DELETE removes specific rows and is logged and transactional; it can be rolled back. TRUNCATE removes all rows quickly without row-by-row logging and cannot be rolled back in most databases. DROP removes the entire table structure and its data permanently.

4. What is a PRIMARY KEY and how does it differ from a UNIQUE constraint?

beginner

A PRIMARY KEY uniquely identifies each row in a table, does not allow NULLs, and only one can exist per table. A UNIQUE constraint also enforces uniqueness but allows one NULL value (in most databases) and multiple UNIQUE constraints can exist on a single table.

5. What is the difference between UNION and UNION ALL?

beginner

UNION combines the results of two queries and removes duplicate rows. UNION ALL combines results including duplicates and is faster because it skips the deduplication step.

6. What are aggregate functions? Give examples.

beginner

Aggregate functions perform a calculation on a set of rows and return a single value. Common examples include COUNT(), SUM(), AVG(), MIN(), and MAX(). They are typically used with GROUP BY to compute metrics per group.

7. What is a subquery and when would you use it?

intermediate

A subquery is a query nested inside another query, used in SELECT, FROM, or WHERE clauses. You use it when you need to filter results based on an aggregated value or derived dataset, for example finding employees whose salary is above the average salary.

8. What is the difference between a correlated and a non-correlated subquery?

intermediate

A non-correlated subquery executes once independently of the outer query and its result is used by the outer query. A correlated subquery references a column from the outer query, causing it to re-execute for every row of the outer query, which can be less performant.

9. What is an index and how does it improve query performance?

intermediate

An index is a data structure (typically a B-tree) built on one or more columns that allows the database engine to locate rows without scanning the entire table. It dramatically speeds up SELECT queries with WHERE, JOIN, or ORDER BY on indexed columns but adds overhead to INSERT, UPDATE, and DELETE operations.

10. What are window functions? Give an example.

intermediate

Window functions perform calculations across a set of table rows related to the current row without collapsing them into a single result like GROUP BY. For example, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) assigns a sequential rank to employees within each department by salary.

11. What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?

intermediate

ROW_NUMBER() assigns a unique sequential integer to every row with no gaps or ties. RANK() assigns the same rank to tied rows but skips subsequent rank numbers (e.g., 1,1,3). DENSE_RANK() assigns the same rank to tied rows but does not skip rank numbers (e.g., 1,1,2).

12. What is a CTE (Common Table Expression) and why is it useful?

intermediate

A CTE is a temporary named result set defined with the WITH keyword that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs improve readability, allow recursive queries, and can be referenced multiple times within the same query unlike subqueries.

13. What is database normalization and what are the first three normal forms?

intermediate

Normalization organizes a database to reduce redundancy and improve integrity. 1NF requires atomic column values and no repeating groups. 2NF requires 1NF plus all non-key attributes must fully depend on the entire primary key (eliminates partial dependency). 3NF requires 2NF plus no transitive dependencies — non-key attributes must depend only on the primary key.

14. What is the difference between an INNER JOIN and a LEFT JOIN in terms of result set size?

intermediate

An INNER JOIN returns only rows where there is a match in both tables, so the result set can be smaller than either input. A LEFT JOIN always returns all rows from the left table, padding with NULLs for columns from the right table where no match exists, so the result set is at least as large as the left table.

15. How does query execution order work in SQL?

intermediate

The logical order is: FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT/OFFSET. This is why column aliases defined in SELECT cannot be used in WHERE (WHERE is evaluated first), but can be used in ORDER BY.

16. What is a deadlock in SQL and how can it be prevented?

advanced

A deadlock occurs when two or more transactions permanently block each other by each holding a lock the other needs. Prevention strategies include accessing tables in a consistent order across transactions, keeping transactions short, using appropriate isolation levels, and using lock hints or retry logic.

17. What are transaction isolation levels and what anomalies do they prevent?

advanced

The four standard levels are READ UNCOMMITTED (allows dirty reads), READ COMMITTED (prevents dirty reads), REPEATABLE READ (also prevents non-repeatable reads), and SERIALIZABLE (also prevents phantom reads). Higher isolation levels provide more consistency but reduce concurrency and can increase contention.

18. What is query optimization and what are common techniques?

advanced

Query optimization improves performance by reducing I/O and CPU usage. Common techniques include adding appropriate indexes, rewriting correlated subqueries as JOINs, avoiding SELECT *, using covering indexes, analyzing execution plans (EXPLAIN/EXPLAIN ANALYZE), partitioning large tables, and updating table statistics so the query planner makes better decisions.

19. What is a recursive CTE and when would you use it?

advanced

A recursive CTE references itself using a UNION ALL between an anchor member (base case) and a recursive member. It is used to query hierarchical or tree-structured data such as organizational charts, bill of materials, or category trees without knowing the depth of the hierarchy in advance.

20. What is the difference between a clustered and a non-clustered index?

advanced

A clustered index determines the physical storage order of rows in a table; there can only be one per table and in SQL Server the primary key is clustered by default. A non-clustered index is a separate structure with pointers back to the table rows, and multiple non-clustered indexes can exist on a single table.

Practice these out loud with an AI interviewer that grills you and grades your answers.
Open the app — free to start

© RM Full Stack & AI Engineer · All interview questions · Roadmaps · Open the app