What is the function used to display a table of statistics about a DataFrame in Pandas?

by EITCA Academy / Wednesday, 02 August 2023 / Published in Artificial Intelligence, EITC/AI/GCML Google Cloud Machine Learning, Further steps in Machine Learning, Data wrangling with pandas (Python Data Analysis Library), Examination review

The function used to display a table of statistics about a DataFrame in Pandas is called `describe()`. This function provides a comprehensive summary of the central tendency, dispersion, and shape of a dataset's distribution. It is a powerful tool for exploratory data analysis and can provide valuable insights into the characteristics of the data.

When applied to a DataFrame, the `describe()` function calculates various statistical measures for each column, including count, mean, standard deviation, minimum, quartiles, and maximum values. These statistics are computed separately for numeric and non-numeric columns.

For numeric columns, the `describe()` function provides the following statistics:
– Count: the number of non-null values in the column.
– Mean: the average value of the column.
– Standard deviation: a measure of the spread of values around the mean.
– Minimum: the smallest value in the column.
– Quartiles: the 25th, 50th (median), and 75th percentiles of the column.
– Maximum: the largest value in the column.

For non-numeric columns, the `describe()` function provides the following statistics:
– Count: the number of non-null values in the column.
– Unique: the number of distinct values in the column.
– Top: the most frequent value in the column.
– Frequency: the frequency of the most frequent value.

The `describe()` function returns a new DataFrame with the calculated statistics as rows and the original column names as columns. This table-like representation makes it easy to compare and analyze the statistics across different columns.

Here's an example to illustrate the usage of the `describe()` function:

python
import pandas as pd

# Create a DataFrame
data = {'A': [1, 2, 3, 4, 5],
        'B': [10, 20, 30, 40, 50],
        'C': [100, 200, 300, 400, 500]}
df = pd.DataFrame(data)

# Display the statistics using describe()
statistics = df.describe()
print(statistics)

Output:

              A          B           C
count  5.000000   5.000000    5.000000
mean   3.000000  30.000000  300.000000
std    1.581139  15.811388  158.113883
min    1.000000  10.000000  100.000000
25%    2.000000  20.000000  200.000000
50%    3.000000  30.000000  300.000000
75%    4.000000  40.000000  400.000000
max    5.000000  50.000000  500.000000

In this example, the `describe()` function calculates the statistics for each column in the DataFrame `df`. The resulting DataFrame `statistics` displays the count, mean, standard deviation, minimum, quartiles, and maximum values for each column.

The `describe()` function in Pandas is a valuable tool for exploring and summarizing the statistics of a DataFrame. It provides a comprehensive overview of the data's distribution, allowing for a deeper understanding of its characteristics.

EITCA Academy

What is the function used to display a table of statistics about a DataFrame in Pandas?

Other recent questions and answers regarding Data wrangling with pandas (Python Data Analysis Library):

More questions and answers:

EITCA Academy is a part of the European IT Certification framework

Eligibility for EITCA Academy 80% EITCI DSJC Subsidy support

EITCA Academy

LOG IN TO YOUR ACCOUNT BY EITHER YOUR USERNAME OR EMAIL ADDRESS

FORGOT YOUR DETAILS?

CREATE AN ACCOUNT

What is the function used to display a table of statistics about a DataFrame in Pandas?

Other recent questions and answers regarding Data wrangling with pandas (Python Data Analysis Library):

More questions and answers:

Eligibility for EITCA Academy 80% EITCI DSJC Subsidy support