3 minute read

In this post, I’ll teach you how to create a simple, yet interesting data visualization using the pandas and seaborn libraries in Python.

Background

The data set we’ll be using, the Palmer Penguins data set, is comprised of statistics recorded of penguins near the Palmer Station in Antarctica, made available by Dr. Kirsten Gorman.

Data Set

Let’s import and take a look at the columns in the data set:

import pandas as pd

url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)
penguins.head()
studyName Sample Number Species Region Island Stage Individual ID Clutch Completion Date Egg Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Sex Delta 15 N (o/oo) Delta 13 C (o/oo) Comments
0 PAL0708 1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A1 Yes 11/11/07 39.1 18.7 181.0 3750.0 MALE NaN NaN Not enough blood for isotopes.
1 PAL0708 2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A2 Yes 11/11/07 39.5 17.4 186.0 3800.0 FEMALE 8.94956 -24.69454 NaN
2 PAL0708 3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A1 Yes 11/16/07 40.3 18.0 195.0 3250.0 FEMALE 8.36821 -25.33302 NaN
3 PAL0708 4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A2 Yes 11/16/07 NaN NaN NaN NaN NaN NaN NaN Adult not sampled.
4 PAL0708 5 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N3A1 Yes 11/16/07 36.7 19.3 193.0 3450.0 FEMALE 8.76651 -25.32426 NaN

As we can see, each row in the data set is a penguin, and the date when the egg was observed, the penguin’s body mass, and the penguin’s sex are observed in the columns Date Egg, Body Mass (g), Sex, respectively.

Something we would be interested in is the difference in body mass between male and female penguins over time. Are male penguins generally heavier than female penguins, or is the opposite true? Has the average mass of a penguin changed over time? Let’s answer these questions with a data visualization.

Data Cleaning

To start, let’s clean the Date Egg column. The dates are too granular in their current state—they contain the date of the egg’ observation, which is too detailed and will lead to a cluttered visualization. Let’s remove the date from the observations and make the column a datetime column.

# dropping NaN values and keeping only male and female penguins
penguins.dropna()
penguins = penguins[penguins["Sex"] != "."]

# splitting Date Egg column, into month, day, and year
penguins[["Month Egg", "Day Egg", "Year Egg"]] = penguins["Date Egg"].str.split("/", expand = True)

# noting that all years begin with "20"
penguins["Year Egg"] = "20" + penguins["Year Egg"]

# converting the month and year into a datetime object
penguins["Year-Month Egg"] = pd.to_datetime(penguins["Year Egg"] + "-" + penguins["Month Egg"])

penguins["Year-Month Egg"].head()

Data Manipulation

Now that we can use the Year-Month Egg column, the next step is to find the averages of the penguins’ body mass by sex and month. To do that, we use the groupby() and reset_index() functions.

# finding the mean body mass per month by sex
averages = penguins.groupby(["Sex", "Year-Month Egg"])[["Body Mass (g)"]].mean()

# resetting dataframe
averages = averages.reset_index()

averages.head()
Sex Year-Month Egg Body Mass (g)
0 FEMALE 2007-11-01 3792.187500
1 FEMALE 2007-12-01 4283.333333
2 FEMALE 2008-11-01 3887.500000
3 FEMALE 2009-11-01 3839.732143
4 FEMALE 2009-12-01 4837.500000

Data Visualization

The dataframe is ready for visualization. We use the seaborn library to create a lineplot of the penguin’s body mass over time, by the penguins’ sex.

import seaborn as sns

# making axis fonts smaller
sns.set(font_scale = 0.75)

sns.lineplot(data = averages,
             x = "Year-Month Egg",
             y = "Body Mass (g)",
             hue = "Sex").set(title = "Body Mass of Female and Male Penguins Over Time")

penguin-plot.png

As we can see, the average body mass of male penguins is higher than female penguins across all months. Average body masses across both sexes increased from 2007 to 2008, declined until 2010, where they saw a sharp increase. Was there more food for the penguins in 2010? Were there less natural predators for the penguins? More research is necessary to find out.

Updated: