Python Pandas — DataFrame

Codes With Pankaj
6 min readAug 23, 2023

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

Features of DataFrame

  • Potentially columns are of different types
  • Size — Mutable
  • Labeled axes (rows and columns)
  • Can Perform Arithmetic operations on rows and columns

How to Create a Pandas DataFrame

Creating a Pandas DataFrame is a fundamental operation in data analysis using Python’s Pandas library. A DataFrame is a two-dimensional labeled data structure with columns that can hold various data types. Here’s how you can create a Pandas DataFrame:

From a Dictionary: You can create a DataFrame from a dictionary where keys represent column names and values are lists or arrays containing data for each column.

import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'Country': ['USA', 'Canada', 'UK']
}
df = pd.DataFrame(data)
print(df)

From a List of Dictionaries: You can also create a DataFrame from a list of dictionaries. Each dictionary corresponds to a row, and the keys in the dictionaries represent column names.

data = [
{'Name': 'Alice', 'Age': 25, 'Country': 'USA'},
{'Name': 'Bob', 'Age': 30, 'Country': 'Canada'},
{'Name': 'Charlie', 'Age': 22, 'Country': 'UK'}
]
df = pd.DataFrame(data)
print(df)

From a NumPy Array: You can create a DataFrame from a NumPy array, where each column in the DataFrame corresponds to a column in the array.

import numpy as np
data = np.array([
['Alice', 25, 'USA'],
['Bob', 30, 'Canada'],
['Charlie', 22, 'UK']
])
df = pd.DataFrame(data, columns=['Name', 'Age', 'Country'])
print(df)

From an Existing DataFrame: You can create a new DataFrame from an existing one by selecting specific columns or rows.

# Assuming 'df' is an existing DataFrame
new_df = df[['Name', 'Age']] # Creating a new DataFrame with only 'Name' and 'Age' columns
print(new_df)

These are just a few examples of how you can create Pandas DataFrames. You can also read data from various file formats (CSV, Excel, SQL databases) using Pandas functions to create DataFrames. Remember to import the Pandas library at the beginning of your script using: import pandas as pd.

How to Select an Index or Column from a Pandas DataFrame

Selecting Columns:You can select one or more columns from a DataFrame by directly referencing the column names within square brackets.

import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'Country': ['USA', 'Canada', 'UK']
}

df = pd.DataFrame(data)

# Select a single column
name_column = df['Name']
print(name_column)

# Select multiple columns
name_age_columns = df[['Name', 'Age']]
print(name_age_columns)

Selecting Rows by Index:

You can select rows from a DataFrame using the .loc or .iloc indexer. The .loc indexer is label-based, while the .iloc indexer is integer-based.

# Using .loc to select rows by label
first_row = df.loc[0]
print(first_row)

# Using .iloc to select rows by integer index
second_row = df.iloc[1]
print(second_row)

Selecting Rows and Columns Simultaneously:

You can use both .loc and .iloc to select specific rows and columns at the same time.

# Using .loc to select specific rows and columns by label
specific_row_col = df.loc[1, 'Name']
print(specific_row_col)

# Using .iloc to select specific rows and columns by integer index
specific_row_col_iloc = df.iloc[0, 1]
print(specific_row_col_iloc)

Setting Index:

You can set a specific column as the index of the DataFrame using the .set_index() method.

df_with_index = df.set_index('Name')
print(df_with_index)

Resetting Index:

If you’ve set an index and want to revert to the default integer index, you can use the .reset_index() method.

df_reset = df_with_index.reset_index()
print(df_reset)

How to Add an Index, Row or Column to a Pandas DataFrame

import pandas as pd

# Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'Country': ['USA', 'Canada', 'UK']
}
df = pd.DataFrame(data)

# Adding an Index
df_with_index = df.set_index('Name')
print("DataFrame with Custom Index:")
print(df_with_index)

# Adding a Row
new_row = {'Name': 'David', 'Age': 28, 'Country': 'Australia'}
df_with_new_row = df.append(new_row, ignore_index=True)
print("\nDataFrame with New Row:")
print(df_with_new_row)

# Adding a Column
df_with_new_column = df_with_new_row.copy() # Create a copy to avoid modifying the original
df_with_new_column['Gender'] = ['Female', 'Male', 'Male', 'Male']
print("\nDataFrame with New Column:")
print(df_with_new_column)
DataFrame with Custom Index:
Age Country
Name
Alice 25 USA
Bob 30 Canada
Charlie 22 UK

DataFrame with New Row:
Name Age Country
0 Alice 25 USA
1 Bob 30 Canada
2 Charlie 22 UK
3 David 28 Australia

DataFrame with New Column:
Name Age Country Gender
0 Alice 25 USA Female
1 Bob 30 Canada Male
2 Charlie 22 UK Male
3 David 28 Australia Male

Adding Rows to a DataFrame

import pandas as pd

# Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'Country': ['USA', 'Canada', 'UK']
}
df = pd.DataFrame(data)

# Displaying the original DataFrame
print("Original DataFrame:")
print(df)

# Adding a New Row
new_row = {'Name': 'David', 'Age': 28, 'Country': 'Australia'}
df = df.append(new_row, ignore_index=True)

# Displaying the DataFrame after adding the new row
print("\nDataFrame after Adding New Row:")
print(df)

Adding a Column to your DataFrame

import pandas as pd

# Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'Country': ['USA', 'Canada', 'UK']
}
df = pd.DataFrame(data)

# Displaying the original DataFrame
print("Original DataFrame:")
print(df)

# Adding a New Column
gender = ['Female', 'Male', 'Male']
df['Gender'] = gender

# Displaying the DataFrame after adding the new column
print("\nDataFrame after Adding New Column:")
print(df)

Resetting the Index of your DataFrame

import pandas as pd

# Creating a DataFrame with a custom index
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'Country': ['USA', 'Canada', 'UK']
}
df = pd.DataFrame(data)
df.set_index('Name', inplace=True) # Setting 'Name' as the index

# Displaying the DataFrame with a custom index
print("DataFrame with Custom Index:")
print(df)

# Resetting the Index
df_reset = df.reset_index()

# Displaying the DataFrame after resetting the index
print("\nDataFrame after Resetting Index:")
print(df_reset)

How to Delete Indices, Rows or Columns from a Pandas Data Frame

Deleting Indices:

You can reset the index to the default integer index or remove it altogether using the .reset_index() method. To remove the index column completely, set the drop=True parameter

import pandas as pd

# Creating a DataFrame with a custom index
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'Country': ['USA', 'Canada', 'UK']
}
df = pd.DataFrame(data)
df.set_index('Name', inplace=True) # Setting 'Name' as the index

# Resetting the index to default
df_reset = df.reset_index()
print("DataFrame with Reset Index:")
print(df_reset)

# Removing the index column
df_reset_drop = df.reset_index(drop=True)
print("\nDataFrame with Dropped Index:")
print(df_reset_drop)

Deleting Rows:

You can delete rows using the .drop() method with the index label or integer index of the rows you want to remove.

# Deleting the second row using index label
df_deleted_row = df.drop('Bob')

# Deleting the first row using integer index
df_deleted_row = df.drop(0)

print("DataFrame after Deleting Row:")
print(df_deleted_row)

Deleting Columns:

You can delete columns using the .drop() method with the column name and specifying axis=1.

# Deleting the 'Age' column
df_deleted_column = df.drop('Age', axis=1)

print("DataFrame after Deleting Column:")
print(df_deleted_column)

How to Rename the Index or Columns of a Pandas DataFrame

Renaming Columns:

You can rename one or more columns by providing a dictionary with the current column names as keys and the new column names as values to the .rename() method.

import pandas as pd

# Creating a DataFrame
data = {
'OldName1': [1, 2, 3],
'OldName2': [4, 5, 6]
}
df = pd.DataFrame(data)

# Renaming columns
df.rename(columns={'OldName1': 'NewName1', 'OldName2': 'NewName2'}, inplace=True)
print(df)

Renaming the Index:

You can rename the index using the .rename_axis() method.

# Creating a DataFrame with a custom index
data = {
'Value': [10, 20, 30]
}
df = pd.DataFrame(data)
df.set_index('Value', inplace=True)

# Renaming the index
df.rename_axis(index={'Value': 'NewIndexName'}, inplace=True)
print(df)

How To Format The Data in Your Pandas DataFrame

Formatting Numeric Data:

You can format numeric data, such as floats, integers, and percentages using the .format() method.

import pandas as pd

# Creating a DataFrame
data = {
'Value': [12345.6789, 987.6543, 0.1234],
'Percentage': [0.4567, 0.1234, 0.7890]
}
df = pd.DataFrame(data)

# Formatting numeric data
df['Value'] = df['Value'].apply(lambda x: '{:.2f}'.format(x))
df['Percentage'] = df['Percentage'].apply(lambda x: '{:.2%}'.format(x))

print(df)

Formatting Dates:

If you have date data, you can format it using the .strftime() method.

from datetime import datetime
import pandas as pd

# Creating a DataFrame with date data
data = {
'Date': [datetime(2023, 8, 1), datetime(2023, 8, 15), datetime(2023, 8, 31)]
}
df = pd.DataFrame(data)

# Formatting date data
df['Formatted Date'] = df['Date'].apply(lambda x: x.strftime('%Y-%m-%d'))

print(df)

Changing Column Data Types:

You can also change the data types of columns to control how they are displayed. For example, you can change a numeric column to a string column to preserve leading zeros.

import pandas as pd

# Creating a DataFrame
data = {
'Number': [123, 456, 789]
}
df = pd.DataFrame(data)

# Changing the data type of 'Number' column to string
df['Number'] = df['Number'].astype(str)

print(df)

Sign up to discover human stories that deepen your understanding of the world.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response