Merging, Joining, Concatenating, and Comparing DataFrames in Pandas Codes With Pankaj

Codes With Pankaj
12 min readMar 15, 2024

--

In data analysis and manipulation with Pandas, merging, joining, concatenating, and comparing are essential operations. These operations allow you to combine datasets, fill missing values, and compare data efficiently. In this tutorial, we’ll explore each of these operations in detail:

Concatenating DataFrames using Pandas

pandas.concat()

The pandas.concat() function allows you to concatenate pandas objects along either rows or columns. It is a flexible tool for combining datasets and can handle optional set logic along other axes.

pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=None)

Parameters:

  • objs: A sequence or mapping of Series or DataFrame objects to concatenate.
  • axis: Axis to concatenate along. 0 for rows, 1 for columns.
  • join: Specifies how to handle indexes on other axes. Can be ‘inner’ or ‘outer’.
  • ignore_index: If True, the resulting axis will be labeled 0, 1, …, n — 1.
  • keys: Adds a hierarchical index using the passed keys as the outermost level.
  • verify_integrity: Checks whether the new concatenated axis contains duplicates.
  • sort: Sorts non-concatenation axis if it is not already aligned.
  • copy: If False, do not copy data unnecessarily.

Examples:

Let’s go through various examples to understand the usage of pandas.concat():

Example 1: Combine Two Series

s1 = pd.Series(['a', 'b'])
s2 = pd.Series(['c', 'd'])

result = pd.concat([s1, s2])
print(result)
//output
0 a
1 b
0 c
1 d
dtype: object

Example 2: Clear Existing Index and Reset It

result = pd.concat([s1, s2], ignore_index=True)
print(result)
//output
0 a
1 b
2 c
3 d
dtype: object

Example 3: Add a Hierarchical Index

result = pd.concat([s1, s2], keys=['s1', 's2'])
print(result)
s1  0    a
1 b
s2 0 c
1 d
dtype: object

Example 4: Combine Two DataFrames

df1 = pd.DataFrame([['a', 1], ['b', 2]], columns=['letter', 'number'])
df2 = pd.DataFrame([['c', 3], ['d', 4]], columns=['letter', 'number'])

result = pd.concat([df1, df2])
print(result)
 //output
letter number
0 a 1
1 b 2
0 c 3
1 d 4

Example 5: Combine DataFrames with Overlapping Columns

df3 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']], columns=['letter', 'number', 'animal'])

result = pd.concat([df1, df3], sort=False)
print(result)
//output  
letter number animal
0 a 1 NaN
1 b 2 NaN
0 c 3 cat
1 d 4 dog

Example 6: Combine DataFrames and Return Only Shared Columns

result = pd.concat([df1, df3], join="inner")
print(result)
//output  
letter number
0 a 1
1 b 2
0 c 3
1 d 4

Example 7: Combine DataFrames Horizontally

df4 = pd.DataFrame([['bird', 'polly'], ['monkey', 'george']], columns=['animal', 'name'])

result = pd.concat([df1, df4], axis=1)
print(result)
//output  
letter number animal name
0 a 1 bird polly
1 b 2 monkey george

Example 8: Prevent Including Duplicate Index Values

df5 = pd.DataFrame([1], index=['a'])
df6 = pd.DataFrame([2], index=['a'])

result = pd.concat([df5, df6], verify_integrity=True)
print(result)
//output
ValueError: Indexes have overlapping values: ['a']

Example 9: Append a Single Row to a DataFrame

df7 = pd.DataFrame({'a': 1, 'b': 2}, index=[0])
new_row = pd.Series({'a': 3, 'b': 4})

result = pd.concat([df7, new_row.to_frame().T], ignore_index=True)
print(result)
 //output  
a b
0 1 2
1 3 4

Exmaple :

import pandas as pd
# codeswithpankaj.com
# Define DataFrame 1 with Indian food
df1 = pd.DataFrame(
{
"Dish": ["Biryani", "Butter Chicken", "Paneer Tikka", "Masala Dosa"],
"Main Ingredient": ["Rice", "Chicken", "Paneer", "Dosa Batter"],
"Spices": ["Biryani Masala", "Butter, Tomato", "Tandoori Masala", "Masala Mix"],
"Region": ["Hyderabad", "Punjab", "North India", "South India"],
},
index=[0, 1, 2, 3],
)

# Define DataFrame 2 with more Indian food
df2 = pd.DataFrame(
{
"Dish": ["Samosa", "Chole Bhature", "Aloo Paratha", "Rasgulla"],
"Main Ingredient": ["Potato, Peas", "Chickpeas", "Potato", "Chhena"],
"Spices": ["Chaats Masala", "Chole Masala", "Cumin, Coriander", "Cardamom"],
"Region": ["North India", "Punjab", "Punjab", "West Bengal"],
},
index=[4, 5, 6, 7],
)

# Define DataFrame 3 with more Indian food
df3 = pd.DataFrame(
{
"Dish": ["Vada Pav", "Pav Bhaji", "Dhokla", "Jalebi"],
"Main Ingredient": ["Potato, Bread", "Mixed Vegetables", "Rice Flour", "Maida"],
"Spices": ["Garlic Chutney", "Pav Bhaji Masala", "Green Chili", "Saffron"],
"Region": ["Maharashtra", "Maharashtra", "Gujarat", "North India"],
},
index=[8, 9, 10, 11],
)

# Concatenate the DataFrames
frames = [df1, df2, df3]
result = pd.concat(frames)

# Display the result
print(result)
             Dish   Main Ingredient            Spices       Region
0 Biryani Rice Biryani Masala Hyderabad
1 Butter Chicken Chicken Butter, Tomato Punjab
2 Paneer Tikka Paneer Tandoori Masala North India
3 Masala Dosa Dosa Batter Masala Mix South India
4 Samosa Potato, Peas Chaats Masala North India
5 Chole Bhature Chickpeas Chole Masala Punjab
6 Aloo Paratha Potato Cumin, Coriander Punjab
7 Rasgulla Chhena Cardamom West Bengal
8 Vada Pav Potato, Bread Garlic Chutney Maharashtra
9 Pav Bhaji Mixed Vegetables Pav Bhaji Masala Maharashtra
10 Dhokla Rice Flour Green Chili Gujarat
11 Jalebi Maida Saffron North India

Generate a table image that includes the row and column indices, we can use the matplotlib.table.Table class along with the Matplotlib library. Here's how you can do it:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.table import Table

# Define DataFrame 1 with Indian food
df1 = pd.DataFrame(
{
"Dish": ["Biryani", "Butter Chicken", "Paneer Tikka", "Masala Dosa"],
"Main Ingredient": ["Rice", "Chicken", "Paneer", "Dosa Batter"],
"Spices": ["Biryani Masala", "Butter, Tomato", "Tandoori Masala", "Masala Mix"],
"Region": ["Hyderabad", "Punjab", "North India", "South India"],
},
index=[0, 1, 2, 3],
)

# Define DataFrame 2 with more Indian food
df2 = pd.DataFrame(
{
"Dish": ["Samosa", "Chole Bhature", "Aloo Paratha", "Rasgulla"],
"Main Ingredient": ["Potato, Peas", "Chickpeas", "Potato", "Chhena"],
"Spices": ["Chaats Masala", "Chole Masala", "Cumin, Coriander", "Cardamom"],
"Region": ["North India", "Punjab", "Punjab", "West Bengal"],
},
index=[4, 5, 6, 7],
)

# Define DataFrame 3 with more Indian food
df3 = pd.DataFrame(
{
"Dish": ["Vada Pav", "Pav Bhaji", "Dhokla", "Jalebi"],
"Main Ingredient": ["Potato, Bread", "Mixed Vegetables", "Rice Flour", "Maida"],
"Spices": ["Garlic Chutney", "Pav Bhaji Masala", "Green Chili", "Saffron"],
"Region": ["Maharashtra", "Maharashtra", "Gujarat", "North India"],
},
index=[8, 9, 10, 11],
)

# Concatenate the DataFrames
frames = [df1, df2, df3]
result = pd.concat(frames)

# Convert the DataFrame to a list of lists for the table
data = [result.columns.tolist()] + result.values.tolist()

# Create a Matplotlib figure and axis
fig, ax = plt.subplots(figsize=(10, 6))

# Create the table
table = ax.table(cellText=data, loc='center', cellLoc='center', colWidths=[0.2]*len(result.columns))

# Adjust the font size
table.auto_set_font_size(False)
table.set_fontsize(10)

# Hide axes
ax.axis('off')

# Add a title
plt.title('Concatenated DataFrame')

# Show the table
plt.show()

pandas.DataFrame.join()

Let’s start by understanding the parameters and then move on to practical examples.

Parameters:

  • other: DataFrame, Series, or a list containing any combination of them. The index should be similar to one of the columns in the caller DataFrame. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.
  • on: String, list of strings, or array-like, optional. Column or index level name(s) in the caller to join on the index in the other DataFrame, otherwise joins index-on-index. It functions similar to an Excel VLOOKUP operation.
  • how: {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘left’. Specifies how to handle the operation of the two objects during the join.
  • lsuffix: String, default ‘’. Suffix to use from the left frame’s overlapping columns.
  • rsuffix: String, default ‘’. Suffix to use from the right frame’s overlapping columns.
  • sort: Boolean, default False. Orders the result DataFrame lexicographically by the join key.
  • validate: String, optional. If specified, checks if the join is of the specified type.

Examples:

Let’s walk through some examples to understand how to use DataFrame.join():

import pandas as pd

# Indian Food DataFrame
indian_food = pd.DataFrame({
'dish': ['Butter Chicken', 'Biryani', 'Samosa', 'Paneer Tikka', 'Masala Dosa'],
'main_ingredient': ['Chicken', 'Rice', 'Potato', 'Paneer', 'Potato'],
'region': ['North India', 'South India', 'North India', 'North India', 'South India']
})

# Indian Food Ingredients DataFrame
ingredients = pd.DataFrame({
'dish': ['Butter Chicken', 'Biryani', 'Samosa'],
'spice_level': ['Medium', 'Spicy', 'Mild']
})

# Join DataFrames using their indexes
result = indian_food.join(ingredients.set_index('dish'), lsuffix='_indian_food', rsuffix='_ingredients')
print(result)
        dish_indian_food main_ingredient       region spice_level
0 Butter Chicken Chicken North India Medium
1 Biryani Rice South India Spicy
2 Samosa Potato North India Mild
3 Paneer Tikka Paneer North India NaN
4 Masala Dosa Potato South India NaN

Using non-unique key values shows how they are matched.

import pandas as pd

# Indian Food DataFrame with non-unique key values
indian_food_non_unique = pd.DataFrame({
'dish': ['Butter Chicken', 'Biryani', 'Biryani', 'Paneer Tikka', 'Butter Chicken'],
'main_ingredient': ['Chicken', 'Rice', 'Rice', 'Paneer', 'Chicken'],
'region': ['North India', 'South India', 'North India', 'North India', 'North India']
})

# Indian Food Ingredients DataFrame
ingredients = pd.DataFrame({
'dish': ['Butter Chicken', 'Biryani', 'Biryani'],
'spice_level': ['Medium', 'Spicy', 'Spicy']
})

# Join DataFrames using their indexes and specifying validation
result = indian_food_non_unique.join(ingredients.set_index('dish'), on='dish', validate='m:1')
print(result)
             dish main_ingredient       region spice_level
0 Butter Chicken Chicken North India Medium
1 Biryani Rice South India Spicy
2 Biryani Rice North India Spicy
3 Paneer Tikka Paneer North India NaN
4 Butter Chicken Chicken North India Medium

Concatenating Series and DataFrame in Pandas

1. Concatenating Series with DataFrame

When concatenating a Series with a DataFrame, the Series will be transformed into a DataFrame with the column name as the name of the Series.

import pandas as pd

s1 = pd.Series(["X0", "X1", "X2", "X3"], name="X")
result = pd.concat([df1, s1], axis=1)
print(result)

2. Concatenating Unnamed Series

Unnamed Series will be numbered consecutively when concatenated with a DataFrame.

s2 = pd.Series(["_0", "_1", "_2", "_3"])
result = pd.concat([df1, s2, s2, s2], axis=1)
print(result)

3. Using ignore_index=True

Setting ignore_index=True will drop all name references, resulting in columns being numbered consecutively.

result = pd.concat([df1, s1], axis=1, ignore_index=True)
print(result)

4. Resulting keys

The keys argument adds another axis level to the resulting index or column, creating a MultiIndex.

result = pd.concat(frames, keys=["x", "y", "z"])
print(result)

# Accessing data from a specific key
print(result.loc["y"])

5. Overriding Column Names with Keys

The keys argument can override the column names when creating a new DataFrame based on existing Series.

s3 = pd.Series([0, 1, 2, 3], name="foo")
s4 = pd.Series([0, 1, 2, 3])
s5 = pd.Series([0, 1, 4, 5])

print(pd.concat([s3, s4, s5], axis=1))
print(pd.concat([s3, s4, s5], axis=1, keys=["red", "blue", "yellow"]))

6. Concatenating a dictionary of DataFrames

You can pass a dictionary of DataFrames to pd.concat(), where each key corresponds to a separate level in the resulting MultiIndex.

pieces = {"x": df1, "y": df2, "z": df3}
result = pd.concat(pieces)
print(result)

# Specifying keys
result = pd.concat(pieces, keys=["z", "y"])
print(result)

7. Appending rows to a DataFrame

If you have a Series representing a single row that you want to append to a DataFrame, you can convert the row into a DataFrame and use pd.concat().

s2 = pd.Series(["X0", "X1", "X2", "X3"], index=["A", "B", "C", "D"])
result = pd.concat([df1, s2.to_frame().T], ignore_index=True)
print(result)

Example demonstrating the concatenation of Series and DataFrame objects using Indian food data:

import pandas as pd

# Sample DataFrame with Indian food data
indian_food_data = pd.DataFrame({
'dish_name': ['Biryani', 'Butter Chicken', 'Masala Dosa', 'Paneer Tikka', 'Samosa'],
'main_ingredient': ['Rice', 'Chicken', 'Potato', 'Paneer', 'Potato'],
'region': ['Hyderabad', 'Punjab', 'South India', 'North India', 'North India']
}

# Sample Series with additional information
additional_info_series = pd.Series(["Spicy", "Creamy", "Savoury", "Vegetarian", "Crispy"], name="attributes")

# Example 1: Concatenating Series with DataFrame
result_concat_series_df = pd.concat([indian_food_data, additional_info_series], axis=1)
print("Concatenating Series with DataFrame:")
print(result_concat_series_df)

# Sample Series without names
unnamed_series = pd.Series(["Rich", "Flavorful", "Delicious", "Healthy", "Fried"])

# Example 2: Concatenating Unnamed Series with DataFrame
result_concat_unnamed_series = pd.concat([indian_food_data, unnamed_series, unnamed_series, unnamed_series], axis=1)
print("\nConcatenating Unnamed Series with DataFrame:")
print(result_concat_unnamed_series)

# Example 3: Using ignore_index=True
result_ignore_index = pd.concat([indian_food_data, additional_info_series], axis=1, ignore_index=True)
print("\nUsing ignore_index=True:")
print(result_ignore_index)

# Example 4: Resulting keys
frames = [indian_food_data.sample(4), indian_food_data.sample(4), indian_food_data.sample(4)]
result_keys = pd.concat(frames, keys=["South", "North", "West"])
print("\nResulting keys:")
print(result_keys)
print("\nAccessing data from a specific key:")
print(result_keys.loc["South"])

# Example 5: Overriding Column Names with Keys
s_additional_info1 = pd.Series(["Spicy", "Creamy", "Savoury", "Vegetarian", "Crispy"], name="attributes")
s_additional_info2 = pd.Series(["Hot", "Tender", "Tangy", "Protein-rich", "Flaky"], name="attributes")
s_additional_info3 = pd.Series(["Aromatic", "Juicy", "Crisp", "Calcium-rich", "Crunchy"], name="attributes")
result_keys_attributes = pd.concat([s_additional_info1, s_additional_info2, s_additional_info3], axis=1, keys=["Region1", "Region2", "Region3"])
print("\nOverriding Column Names with Keys:")
print(result_keys_attributes)

# Example 6: Concatenating a dictionary of DataFrames
pieces = {"North": indian_food_data.sample(2), "South": indian_food_data.sample(2), "East": indian_food_data.sample(2)}
result_dict = pd.concat(pieces)
print("\nConcatenating a dictionary of DataFrames:")
print(result_dict)

# Specifying keys
result_dict_with_keys = pd.concat(pieces, keys=["North", "South", "East"])
print("\nSpecifying keys:")
print(result_dict_with_keys)

# Example 7: Appending rows to a DataFrame
new_row_series = pd.Series(["Flavourful", "Marinated", "Light", "Non-vegetarian", "Crunchy"], index=["dish_name", "main_ingredient", "region", "attributes", "New_Column"])
result_append_row = pd.concat([indian_food_data, new_row_series.to_frame().T], ignore_index=True)
print("\nAppending rows to a DataFrame:")
print(result_append_row)
Concatenating Series with DataFrame:
dish_name main_ingredient region attributes
0 Biryani Rice Hyderabad Spicy
1 Butter Chicken Chicken Punjab Creamy
2 Masala Dosa Potato South India Savoury
3 Paneer Tikka Paneer North India Vegetarian
4 Samosa Potato North India Crispy

Concatenating Unnamed Series with DataFrame:
dish_name main_ingredient region 0 1 2
0 Biryani Rice Hyderabad Rich Flavorful Delicious
1 Butter Chicken Chicken Punjab Flavorful Delicious Delicious
2 Masala Dosa Potato South India Delicious Delicious Delicious
3 Paneer Tikka Paneer North India Healthy Healthy Healthy
4 Samosa Potato North India Fried Fried Fried

Using ignore_index=True:
0 1 2 3 4
0 Biryani Rice Hyderabad Spicy NaN
1 Butter Chicken Chicken Punjab Creamy NaN
2 Masala Dosa Potato South India Savoury NaN
3 Paneer Tikka Paneer North India Vegetarian NaN
4 Samosa Potato North India Crispy NaN

Resulting keys:
dish_name main_ingredient region
South 0 Butter Chicken Chicken Punjab
1 Paneer Tikka Paneer North India
North 0 Butter Chicken Chicken Punjab
1 Paneer Tikka Paneer North India
West 0 Samosa Potato North India
1 Butter Chicken Chicken Punjab

Accessing data from a specific key:
dish_name main_ingredient region
0 Butter Chicken Chicken Punjab
1 Paneer Tikka Paneer North India

Overriding Column Names with Keys:
Region1 Region2 Region3
0 Spicy Hot Aromatic
1 Creamy Tender Juicy
2 Savoury Tangy Crisp
3 Vegetarian Protein-rich Calcium-rich
4 Crispy Flaky Crunchy

Concatenating a dictionary of DataFrames:
dish_name main_ingredient region
North 0 Biryani Rice Hyderabad
1 Samosa Potato North India
South 0 Biryani Rice Hyderabad
1 Samosa Potato North India
East 0 Biryani Rice Hyderabad
1 Samosa Potato North India

Specifying keys:
dish_name main_ingredient region
North Biryani Rice Hyderabad
Samosa Potato North India
South Biryani Rice Hyderabad
Samosa Potato North India
East Biryani Rice Hyderabad
Samosa Potato North India

Appending rows to a DataFrame:
dish_name main_ingredient region attributes New_Column
0 Biryani Rice Hyderabad Spicy NaN
1 Butter Chicken Chicken Punjab Creamy NaN
2 Masala Dosa Potato South India Savoury NaN
3 Paneer Tikka Paneer North India Vegetarian NaN
4 Samosa Potato North India Crispy NaN
5 Paneer Tikka Marinated North India Light NaN

Pandas Merge

It’s common to combine datasets based on common columns or indices. pandas.merge() is a powerful function in the Pandas library that allows you to perform this task efficiently. This tutorial will cover the various parameters of pandas.merge() and provide examples to demonstrate its usage.

pandas.merge(
left,
right,
how='inner',
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
sort=False,
suffixes=('_x', '_y'),
copy=None,
indicator=False,
validate=None
)

Parameters

  • left, right: DataFrame or named Series objects to merge.
  • how: Type of merge to be performed. Options include 'left', 'right', 'outer', 'inner', and 'cross'.
  • on: Column or index level names to join on.
  • left_on, right_on: Column or index level names to join on in the left and right DataFrames, respectively.
  • left_index, right_index: Use the index from the left or right DataFrame as the join key(s).
  • sort: Sort the join keys lexicographically in the result DataFrame.
  • suffixes: A length-2 sequence indicating the suffixes to add to overlapping column names.
  • copy: If False, avoid copy if possible.
  • indicator: If True, adds a column to the output DataFrame with information on the source of each row.
  • validate: If specified, checks if merge is of specified type.

Example 1: Basic Merge

df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5]})
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'], 'value': [5, 6, 7, 8]})
merged_df = df1.merge(df2, left_on='lkey', right_on='rkey')
print(merged_df)

Example 2: Merge with Suffixes

df1 = pd.DataFrame({'lkey': ['foo', 'bar'], 'value': [1, 2]})
df2 = pd.DataFrame({'rkey': ['foo', 'baz'], 'value': [3, 4]})
merged_df = df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=('_left', '_right'))
print(merged_df)

Example 3: Cross Merge

df1 = pd.DataFrame({'left': ['foo', 'bar']})
df2 = pd.DataFrame({'right': [7, 8]})
cross_merged_df = df1.merge(df2, how='cross')
print(cross_merged_df)

Types of Merges

1. One-to-One Merge

In a one-to-one merge, two DataFrame objects are joined based on their indexes, which must contain unique values.

2. Many-to-One Merge

In a many-to-one merge, a unique index from one DataFrame is joined to one or more columns in another DataFrame.

3. Many-to-Many Merge

In a many-to-many merge, columns from one DataFrame are joined with columns from another DataFrame.

merge() Function Syntax

The syntax of the merge() function in Pandas is as follows:

pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

Examples :

One-to-One Merge

left = pd.DataFrame({"key": ["K0", "K1", "K2", "K3"], "A": ["A0", "A1", "A2", "A3"], "B": ["B0", "B1", "B2", "B3"]})
right = pd.DataFrame({"key": ["K0", "K1", "K2", "K3"], "C": ["C0", "C1", "C2", "C3"], "D": ["D0", "D1", "D2", "D3"]})
result = pd.merge(left, right, on="key")
print(result)

Many-to-One Merge

left = pd.DataFrame({"key1": ["K0", "K0", "K1", "K2"], "key2": ["K0", "K1", "K0", "K1"], "A": ["A0", "A1", "A2", "A3"], "B": ["B0", "B1", "B2", "B3"]})
right = pd.DataFrame({"key1": ["K0", "K1", "K1", "K2"], "key2": ["K0", "K0", "K0", "K0"], "C": ["C0", "C1", "C2", "C3"], "D": ["D0", "D1", "D2", "D3"]})
result = pd.merge(left, right, how="left", on=["key1", "key2"])
print(result)

Many-to-Many Merge

left = pd.DataFrame({"A": [1, 2], "B": [2, 2]})
right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]})
result = pd.merge(left, right, on="B", how="outer")
print(result)

Example :

import pandas as pd

# Sample Indian food dataset
left_indian_food = pd.DataFrame({
"dish": ["butter chicken", "paneer tikka", "samosa", "dal makhani"],
"main_ingredient": ["chicken", "paneer", "potato", "lentils"],
"region": ["Punjab", "North India", "North India", "Punjab"]
})

right_indian_food = pd.DataFrame({
"dish": ["butter chicken", "paneer tikka", "samosa", "naan"],
"spicy_level": ["medium", "mild", "medium", "none"],
"region": ["Punjab", "North India", "North India", "North India"]
})

# One-to-One Merge
result_one_to_one = pd.merge(left_indian_food, right_indian_food, on="dish")

# Many-to-One Merge
result_many_to_one = pd.merge(left_indian_food, right_indian_food, how="left", on="region")

# Many-to-Many Merge
result_many_to_many = pd.merge(left_indian_food, right_indian_food, on="region", how="outer")

print("One-to-One Merge:")
print(result_one_to_one)

print("\nMany-to-One Merge:")
print(result_many_to_one)

print("\nMany-to-Many Merge:")
print(result_many_to_many)

One-to-One Merge :

            dish main_ingredient   region spicy_level
0 butter chicken chicken Punjab medium
1 paneer tikka paneer Punjab mild
2 samosa potato Haryana medium

Many-to-One Merge :

            dish main_ingredient     region spicy_level
0 butter chicken chicken Punjab medium
1 paneer tikka paneer Punjab mild
2 samosa potato North India medium
3 dal makhani lentils Punjab NaN

Many-to-Many Merge :

            dish_x main_ingredient_x   region           dish_y spicy_level
0 butter chicken chicken Punjab butter chicken medium
1 butter chicken chicken Punjab paneer tikka mild
2 paneer tikka paneer Punjab butter chicken medium
3 paneer tikka paneer Punjab paneer tikka mild
4 samosa potato Haryana butter chicken medium
5 dal makhani lentils Punjab butter chicken medium

This tutorial codeswithpankaj covers the fundamental aspects of merging, joining, concatenating, and comparing DataFrames in Pandas. These operations are crucial for data manipulation tasks and provide flexibility in handling diverse datasets. Practice with various examples to master these techniques and enhance your data analysis skills

--

--