11.3. Chi-Square Goodness-of-Fit Test#
Here, I show a more elegant way to count the number of birth per month by using the groupby()
method of a Pandas dataframe. Let’s start by loading the births data:
import pandas as pd
births=pd.read_csv('https://www.fdsp.net/data/births.csv')
Here is how we
births_by_month = np.zeros(12)
for i, month in enumerate(months):
births_by_month[i] = \
births.query('month=="' + month + '"')['count'].sum()
print(births_by_month)
[6906798. 6448725. 7080880. 6788266. 7112239. 7059986. 7461489. 7552007.
7365904. 7220646. 6813037. 7079453.]
A more elegant approach to doing this uses the dataframe’s groupby()
method:
births.groupby('month').sum(numeric_only=True)['count']
month
Apr 6788266
Aug 7552007
Dec 7079453
Feb 6448725
Jan 6906798
Jul 7461489
Jun 7059986
Mar 7080880
May 7112239
Nov 6813037
Oct 7220646
Sep 7365904
Name: count, dtype: int64
We will not use the output of this statement that uses groupby()
and sum()
, but if we were to use that, we would need to be sure to properly index that data to align it with the player birth months, which are in numerical order.
11.3.1. Terminology Review#
Use the flashcards below to help you review the terminology introduced in this chapter. \(~~~~ ~~~~ ~~~~ \mbox{ }\)