Demystifying SQL Joins: A Comprehensive Guide — Codes With Pankaj
Structured Query Language (SQL) is a powerful tool for managing and extracting data from relational databases. One of its fundamental capabilities is joining tables to combine and retrieve data from multiple sources. In this blog post, we will delve into the world of SQL joins, exploring five common types: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, and SELF JOIN.
INNER JOIN: Combining Data
SQL INNER JOIN is used to combine rows from two or more tables based on a related column between them. Only the rows with matching values in the specified columns will be included in the result set. Here’s an example of how to use an INNER JOIN:
Suppose we have two tables: customers
and orders
, and we want to retrieve a list of customers who have placed orders. The customers
table contains information about customers, and the orders
table contains information about orders, including a customer ID that links each order to a customer.
Here are the two tables:
customers table:
+----+-----------+-------------+
| id | name | email |
+----+-----------+-------------+
| 1 | John Doe | john@p4n.in |
| 2 | Jane Smith| jane@p4n.in |
| 3 | Bob Brown | bob@p4n.in |
+----+-----------+-------------+
orders table:
+-----+------------+--------+
| id | customer_id| amount |
+-----+------------+--------+
| 101 | 1 | 50.00 |
| 102 | 2 | 75.00 |
| 103 | 1 | 30.00 |
| 104 | 3 | 40.00 |
+-----+------------+--------+
Now, let’s perform an INNER JOIN to get a list of customers who have placed orders:
SELECT customers.name, orders.amount
FROM customers
INNER JOIN orders ON customers.id = orders.customer_id;
In this SQL query:
- We specify the tables we want to join (
customers
andorders
). - We use
INNER JOIN
to indicate that we only want to include rows with matching customer IDs in both tables. - We specify the join condition using the
ON
clause, which matches rows where theid
column in thecustomers
table matches thecustomer_id
column in theorders
table.
The result of this query will be :
+-----------+--------+
| name | amount |
+-----------+--------+
| John Doe | 50.00 |
| Jane Smith| 75.00 |
| John Doe | 30.00 |
| Bob Brown | 40.00 |
+-----------+--------+
This result set shows the names of customers who have placed orders along with the corresponding order amounts. The INNER JOIN
ensures that only customers with matching orders are included in the result.
LEFT JOIN: Inclusive Extraction
SQL LEFT JOIN, also known as a LEFT OUTER JOIN, is used to retrieve all rows from the left table (the first table mentioned in the query) and the matched rows from the right table (the second table mentioned) based on a specified condition. If there are no matches in the right table, the result will still contain all the rows from the left table, with NULL values in the columns from the right table.
Here’s an example of how to use a LEFT JOIN:
Suppose we have two tables: employees
and departments
, and we want to retrieve a list of all employees and the department they belong to, if they are assigned to a department. The employees
table contains information about employees, and the departments
table contains information about departments, including a department ID that links employees to their respective departments.
Here are the two tables:
employees table:
+------+----------+-----------+
| emp_id | emp_name | dept_id |
+------+----------+-----------+
| 1 | John Doe | 101 |
| 2 | Jane Smith | 102 |
| 3 | Bob Brown | 101 |
| 4 | Alice Lee | NULL |
+------+----------+-----------+
departments table:
+--------+-------------+
| dept_id | dept_name |
+--------+-------------+
| 101 | HR |
| 102 | IT |
| 103 | Sales |
+--------+-------------+
Now, let’s perform a LEFT JOIN to get a list of all employees and their respective departments (if assigned to one):
SELECT employees.emp_name, departments.dept_name
FROM employees
LEFT JOIN departments ON employees.dept_id = departments.dept_id;
In this SQL query:
- We specify the tables we want to join (
employees
anddepartments
). - We use
LEFT JOIN
to indicate that we want all rows from theemployees
table and only the matching rows from thedepartments
table. - We specify the join condition using the
ON
clause, which matches rows where thedept_id
column in theemployees
table matches thedept_id
column in thedepartments
table.
The result of this query will be:
+----------+-------------+
| emp_name | dept_name |
+----------+-------------+
| John Doe | HR |
| Jane Smith| IT |
| Bob Brown | HR |
| Alice Lee | NULL |
+----------+-------------+
In the result set, you can see that all employees are included in the output, even if they are not assigned to a department (e.g., Alice Lee). When there is no match in the departments
table for an employee, the dept_name
column contains NULL.
RIGHT JOIN: The Mirror Image.
SQL RIGHT JOIN, also known as a RIGHT OUTER JOIN, is used to retrieve all rows from the right table (the second table mentioned in the query) and the matched rows from the left table (the first table mentioned) based on a specified condition. If there are no matches in the left table, the result will still contain all the rows from the right table, with NULL values in the columns from the left table.
Here’s an example of how to use a RIGHT JOIN:
Suppose we have two tables: orders
and customers
, and we want to retrieve a list of all orders and the customer information for customers who have placed those orders. The orders
table contains information about orders, and the customers
table contains information about customers, including a customer ID that links each order to a customer.
Here are the two tables:
orders table:
+-----+------------+--------+
| id | customer_id| amount |
+-----+------------+--------+
| 101 | 1 | 50.00 |
| 102 | 2 | 75.00 |
| 103 | 1 | 30.00 |
| 104 | 3 | 40.00 |
+-----+------------+--------+
customers table:
+----+-----------+------------+
| id | name | email |
+----+-----------+------------+
| 1 | John Doe | john@p4n.in|
| 2 | Jane Smith| jane@p4n.in|
| 3 | Bob Brown | bob@p4n.in |
+----+-----------+------------+
Now, let’s perform a RIGHT JOIN to get a list of all orders and the corresponding customer information (if available):
SELECT customers.name, orders.amount
FROM orders
RIGHT JOIN customers ON orders.customer_id = customers.id;
In this SQL query:
- We specify the tables we want to join (
orders
andcustomers
). - We use
RIGHT JOIN
to indicate that we want all rows from thecustomers
table and only the matching rows from theorders
table. - We specify the join condition using the
ON
clause, which matches rows where thecustomer_id
column in theorders
table matches theid
column in thecustomers
table.
The result of this query will be:
+-----------+--------+
| name | amount |
+-----------+--------+
| John Doe | 50.00 |
| Jane Smith| 75.00 |
| John Doe | 30.00 |
| Bob Brown | 40.00 |
+-----------+--------+
In the result set, you can see that all orders are included in the output, even if they do not have a matching customer (e.g., the customer information for order 104 is missing). When there is no match in the customers
table for an order, the name
column contains NULL.
FULL JOIN: Comprehensive Data Retrieval
FULL JOIN, also known as a FULL OUTER JOIN, combines the results of both a LEFT JOIN and a RIGHT JOIN. It returns all rows from both tables, matching rows where the specified condition is met, and filling in NULL values for columns where there is no match in either the left or the right table.
Here’s an example of how to use a FULL JOIN:
Suppose we have two tables: employees
and departments
, and we want to retrieve a list of all employees and their respective departments (if assigned to one), including employees who are not assigned to any department. The employees
table contains information about employees, and the departments
table contains information about departments, including a department ID that links employees to their respective departments.
Here are the two tables:
employees table:
+------+----------+-----------+
| emp_id | emp_name | dept_id |
+------+----------+-----------+
| 1 | John Doe | 101 |
| 2 | Jane Smith | 102 |
| 3 | Bob Brown | 101 |
| 4 | Alice Lee | NULL |
+------+----------+-----------+
departments table:
+--------+-------------+
| dept_id | dept_name |
+--------+-------------+
| 101 | HR |
| 102 | IT |
| 103 | Sales |
+--------+-------------+
Now, let’s perform a FULL JOIN to get a list of all employees and their respective departments (if available):
SELECT employees.emp_name, departments.dept_name
FROM employees
FULL JOIN departments ON employees.dept_id = departments.dept_id;
In this SQL query:
- We specify the tables we want to join (
employees
anddepartments
). - We use
FULL JOIN
to indicate that we want all rows from both theemployees
anddepartments
tables. - We specify the join condition using the
ON
clause, which matches rows where thedept_id
column in theemployees
table matches thedept_id
column in thedepartments
table.
The result of this query will be:
+------------+-------------+
| emp_name | dept_name |
+------------+-------------+
| John Doe | HR |
| Jane Smith | IT |
| Bob Brown | HR |
| Alice Lee | NULL |
| NULL | Sales |
+------------+-------------+
In the result set, you can see that all employees and departments are included in the output, and NULL values are filled in where there are no matches (e.g., Alice Lee is not assigned to any department, and there is a department with no employees in Sales).
SELF JOIN: Linking Data Within.
SQL SELF JOIN is a type of join where you join a table with itself. This can be useful when you have hierarchical data or when you want to compare rows within the same table. Here’s an example of how to use a SELF JOIN:
Suppose you have a table called employees
with the following structure:
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
emp_name VARCHAR(50),
manager_id INT
);
In this table, each row represents an employee, and the manager_id
column refers to the employee's manager. The manager_id
is a foreign key that points to the emp_id
of another employee in the same table.
Here’s some sample data:
+-------+------------+------------+
| emp_id| emp_name | manager_id |
+-------+------------+------------+
| 1 | John Doe | NULL |
| 2 | Jane Smith | 1 |
| 3 | Bob Brown | 1 |
| 4 | Alice Lee | 2 |
| 5 | Carol White| 3 |
+-------+------------+------------+
In this example, John Doe is the top-level manager (manager_id is NULL), and employees report to their respective managers.
Now, let’s say we want to retrieve a list of employees along with the names of their managers. We can use a SELF JOIN to achieve this:
SELECT e1.emp_name AS employee_name, e2.emp_name AS manager_name
FROM employees e1
LEFT JOIN employees e2 ON e1.manager_id = e2.emp_id;
In this SQL query:
- We give aliases to the same table to distinguish between the employee and manager roles:
e1
represents the employee, ande2
represents the manager. - We perform a LEFT JOIN on the
employees
table, joininge1.manager_id
withe2.emp_id
. This establishes the relationship between employees and their managers. - We select the
emp_name
column frome1
asemployee_name
and theemp_name
column frome2
asmanager_name
.
The result of this query will be:
+---------------+---------------+
| employee_name | manager_name |
+---------------+---------------+
| John Doe | NULL |
| Jane Smith | John Doe |
| Bob Brown | John Doe |
| Alice Lee | Jane Smith |
| Carol White | Bob Brown |
+---------------+---------------+
In the result set, you can see the list of employees along with their respective managers. John Doe, being a top-level manager, has NULL as the manager’s name. Other employees have their managers’ names listed.
Conclusion
SQL joins are powerful tools for extracting and combining data from relational databases. By understanding the five common types of joins — INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, and SELF JOIN — you can manipulate and query your data effectively. Whether you’re retrieving data from different tables or establishing relationships within the same table, SQL joins are essential in your data manipulation toolkit.