The function used to display a table of statistics about a DataFrame in Pandas is called `describe()`. This function provides a comprehensive summary of the central tendency, dispersion, and shape of a dataset's distribution. It is a powerful tool for exploratory data analysis and can provide valuable insights into the characteristics of the data.
When applied to a DataFrame, the `describe()` function calculates various statistical measures for each column, including count, mean, standard deviation, minimum, quartiles, and maximum values. These statistics are computed separately for numeric and non-numeric columns.
For numeric columns, the `describe()` function provides the following statistics:
– Count: the number of non-null values in the column.
– Mean: the average value of the column.
– Standard deviation: a measure of the spread of values around the mean.
– Minimum: the smallest value in the column.
– Quartiles: the 25th, 50th (median), and 75th percentiles of the column.
– Maximum: the largest value in the column.
For non-numeric columns, the `describe()` function provides the following statistics:
– Count: the number of non-null values in the column.
– Unique: the number of distinct values in the column.
– Top: the most frequent value in the column.
– Frequency: the frequency of the most frequent value.
The `describe()` function returns a new DataFrame with the calculated statistics as rows and the original column names as columns. This table-like representation makes it easy to compare and analyze the statistics across different columns.
Here's an example to illustrate the usage of the `describe()` function:
python import pandas as pd # Create a DataFrame data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50], 'C': [100, 200, 300, 400, 500]} df = pd.DataFrame(data) # Display the statistics using describe() statistics = df.describe() print(statistics)
Output:
A B C count 5.000000 5.000000 5.000000 mean 3.000000 30.000000 300.000000 std 1.581139 15.811388 158.113883 min 1.000000 10.000000 100.000000 25% 2.000000 20.000000 200.000000 50% 3.000000 30.000000 300.000000 75% 4.000000 40.000000 400.000000 max 5.000000 50.000000 500.000000
In this example, the `describe()` function calculates the statistics for each column in the DataFrame `df`. The resulting DataFrame `statistics` displays the count, mean, standard deviation, minimum, quartiles, and maximum values for each column.
The `describe()` function in Pandas is a valuable tool for exploring and summarizing the statistics of a DataFrame. It provides a comprehensive overview of the data's distribution, allowing for a deeper understanding of its characteristics.
Other recent questions and answers regarding Data wrangling with pandas (Python Data Analysis Library):
- What are some of the data cleaning tasks that can be performed using Pandas?
- How can you shuffle your data set using Pandas?
- How can you access a specific column of a DataFrame in Pandas?
- What is the purpose of the "read_csv" function in Pandas, and what data structure does it load the data into?