Our hypothesis is that 6th, 7th and 8th-grade charter schools students perform better than public schools students in the same grades on the ELA and mathematics standardized tests. We will measure this by comparing the number of students failing to meet standards and more students meeting or above standards on California's standardized testing CAASPP.
We will also implicitly be making the assumption that higher performance on the CAASPP indicates greater student learning.
By conducting paired t-tests on the percentage of students who did not meet standards and for those who were at or above standards we were able to determine that there was significant evidence to believe that there was a real difference between the means of the publicly funded and locally funded charter schools. However, there was not the same evidence for the directly funded charter schools.
Our findings are tentative at this point. We have not separated our data further into subcategories. For instance, can this difference in performance be accounted for by a difference in gender or income level of attending students? It is possible that there are more locally funded charter schools in higher income areas, meaning that the difference in performance can be accounted for by family income? Is it possible that 2017 was just a good year for charter schools? How do other years compare? These are all areas for further analysis.
There are three types of schools we will be looking at public schools, direct-funded charter schools, and locally funded charter schools.
A charter school is an independently run public school granted more operation flexibility, in return for higher performance accountability. Each school is established by charter. This charter is essentially a contract detailing the school's mission, program, and performance goals.
Charter schools are public schools in the sense that they are free to attend. However, unlike public schools, charter schools are schools of choice, meaning that families choose to send their kids to them. These schools operate with freedom from some of the regulations that are typically imposed upon district schools but are still accountable for their academic results and for upholding the promises made in their charters. Charter schools must also participate in state standardized testing. They must demonstrate performance in academic achievement, financial management, and organizational stability. If a charter school does not meet their performance goals, it can be closed.
There are two funding types for charter schools. Direct-funded charter schools elect to receive funding directly from the state. Whereas locally funded charter schools get their funding from their local education agency/school district.
More info about charter vs public schools:
https://www.ed-data.org/article/Charter-Schools-in-California
Data can be found at:
import pandas as pd
import sqlite3
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from scipy.stats import relfreq
from scipy.stats import ttest_ind
from scipy.stats import f_oneway
df_ca = pd.read_csv("ca2017.txt")
df_ca_entities = pd.read_csv("ca2017_entities.txt")
df_ca.columns = df_ca.columns.str.replace(' ','_' ).str.lower()
df_ca_entities.columns = df_ca_entities.columns.str.replace(' ','_' ).str.lower()
#remove entries with missing data
df_ca = df_ca[df_ca.total_tested_at_entity_level != '*']
df_ca = df_ca[df_ca.percentage_standard_met != '*']
df_ca.head()
#change dtype from object to float
df_ca = df_ca.astype({col: float for col in df_ca.columns[11:]})
df_ca.iloc[:,:-12].describe()
#Create dbs from caaspp data
conn = sqlite3.connect('ca2017.db')
df_ca.to_sql('caasp', conn, if_exists = 'replace')
df_ca_entities.to_sql('entities', conn, if_exists = 'replace')
#query schools in the 6, 7, 8 grades where 100% of students either met or did not met standards
q = """SELECT c.school_code
, e.county_name
, e.district_name
, e.school_name
, c.percentage_standard_met_and_above
, c.percentage_standard_not_met
, c.subgroup_id
FROM caasp c
LEFT JOIN entities e ON c.school_code = e.school_code AND c.district_code = e.district_code
WHERE (c.percentage_standard_met_and_above = 100 OR c.percentage_standard_not_met = 100)
AND c.grade IN (6, 7, 8)
ORDER BY c.percentage_standard_met_and_above DESC;
"""
The percentage of standards not met and percentage of standards met and above has a max value of 100. This seems supicious as it would mean that 100% of the school scored in the given category. We will look into these schools to see if these values seem reasonable. For example these could be prep or alternate schools.
schools_hund_percent = pd.read_sql_query(q, conn)
schools_hund_percent.head()
Looking through the schools that have 100% of students meeting or above standards, we see that these schools are schools that are known for academic excellence.
On the other hand, schools that have 100% of students failing to meet criteria are often alternative/low income area schools.
We also see that rows with school code 0 are not schools but seem to be test centers for the county. Since these are not schools we shall not use these entries during further analysis. We will keep this in mind while writing our SQL.
df_ca_entities.describe()
#create reference table for labels to id codes
test_types = pd.DataFrame([[1,'ELA'], [2, 'Mathematics']], columns =['test_id', 'test_type']).set_index('test_id')
school_types = pd.DataFrame([[7, 'Public School'], [9, 'Direct Funded Charter'], [10, 'Locally Funded Charter']],
columns = ['type_id', 'type']).set_index('type_id')
print('Reference Tables\n\n')
print(test_types, '\n')
print(school_types)
"""query averages for each performance category for students in the 6th, 7th, and 8th grades.
Group results by school type, student grade, and test type
"""
perform_avgs_query = """SELECT COUNT(*) total_schools
, e.type_id
, c.grade
, c.test_id
, AVG(c.students_tested)
, AVG(c.mean_scale_score)
, AVG(c.percentage_standard_exceeded)
, AVG(c.percentage_standard_met)
, AVG(c.percentage_standard_met_and_above)
, AVG(c.percentage_standard_nearly_met)
, AVG(c.percentage_standard_not_met)
, AVG(c.students_with_scores)
FROM caasp c
INNER JOIN entities e ON c.school_code = e.school_code
AND c.county_code = e.county_code
AND c.district_code = e.district_code
WHERE c.grade IN (6, 7, 8) AND e.type_id IN (7, 9, 10)
AND c.subgroup_id = 1 AND c.school_code >0
GROUP BY e.type_id, c.grade, c.test_id
ORDER BY c.grade, c.test_id;
"""
middle_test_avgs = pd.read_sql_query(perform_avgs_query, conn)
middle_test_avgs.loc[middle_test_avgs.grade == 6]
When considereding average standardized test scores for 6th grade California students:
Direct funded charter schools (type_id 9) preform worse than locally funded charter schools (type_id 10) and public schools (type_id 8) in all categories.
When compared to public schools, locally funded charter schools have a smaller percentage of students failing to meet standards and comparable percentage of students exceeding.
middle_test_avgs.loc[middle_test_avgs.grade == 7]
When considereding average standardized test scores for 7th grade California students:
Direct funded charter schools under preform locally funded charter schools and public schools but outperform public schools is the number of students meeting, above, and exceeding in both test categories. On the other hand, direct funded charter schools have the highest percentage of students failing to meet standards. This might point to some direct funded charter schools significantly outpreforming others.
Locally funded charter schools outperform the direct funded charter schools and public schools, especially for the mathematics portion.
middle_test_avgs.loc[middle_test_avgs.grade == 8]
When considereding average standardized test scores for 8th grade California students:
Direct funded charter schools under preform compared to locally funded charter schools and public schools but outperform public schools is the number of students meeting, above, and exceeding in both test categories. Direct funded charter schools have the highest percentage of students failing to meet standards in mathematics.
Public schools have the lowest percentage of students failing to meet standards on both tests.
Locally funded charter schools have the highest percentage of students meetng and exceeding standards.
#Create view for avg test results above and below standards by school type, test type and grade
view = """ CREATE VIEW averages AS
SELECT c.grade
, c.test_id
, e.type_id
, AVG(c.percentage_standard_not_met) avg_percent_std_not_met
, AVG(c.percentage_standard_met_and_above) avg_percentage_standard_met_and_above
FROM caasp c
INNER JOIN entities e ON c.school_code = e.school_code
AND c.county_code = e.county_code
AND c.district_code = e.district_code
WHERE c.grade IN (6, 7, 8) AND e.type_id IN (7, 9, 10) AND c.subgroup_id = 1 AND c.school_code >0
GROUP BY e.type_id, c.grade, c.test_id
ORDER BY c.grade, c.test_id;
"""
#query num of schools grouped by type, test type and grade with more than avg percentage of students below stnds
below_avg_query = """SELECT c.grade
, c.test_id
, e.type_id
, a.avg_percent_std_not_met
, COUNT(*) num_percent_std_not_met_grter_avg
FROM caasp c
INNER JOIN entities e ON c.school_code = e.school_code
AND c.county_code = e.county_code
AND c.district_code = e.district_code
INNER JOIN averages a ON c.grade = a.grade AND c.test_id = a.test_id
AND e.type_id = a.type_id
WHERE c.grade IN (6, 7, 8) AND e.type_id IN (7, 9, 10)
AND c.percentage_standard_not_met > a.avg_percent_std_not_met
GROUP BY e.type_id, c.grade, c.test_id
ORDER BY c.grade, c.test_id;
"""
#pd.read_sql_query('DROP VIEW averages;', conn)
#pd.read_sql_query(view, conn)
stds_not_met = pd.read_sql_query(below_avg_query, conn)
def convert_name(num_tup):
#converts lables to tup of names
return(num_tup[0], test_types.loc[num_tup[1]].values[0], school_types.loc[num_tup[2]].values[0])
labels = avg_diffs_from_pub[['grade', 'test_id', 'type_id']].values.tolist()
labels_convert = [convert_name(x) for x in labels]
label_names = [(x[0], x[2]) for x in labels_convert]
colors = ['#5c678a' if x[1] == 'ELA' else '#00b3b3' for x in labels_convert]
x_values = range(avg_diffs_from_pub.shape[0])
stds_not_met['total_schools'] = middle_test_avgs.total_schools
stds_not_met['percentage_greater_than_avg'] = (stds_not_met.num_percent_std_not_met_grter_avg/
stds_not_met.total_schools)
plt.figure(figsize =(8,6))
plt.bar(range(avg_diffs_from_pub.shape[0]), stds_not_met.percentage_greater_than_avg, color = colors)
plt.xticks(range(avg_diffs_from_pub.shape[0]),label_names, rotation = 'vertical')
#create legend
ela_patch = mpatches.Patch(color='#5c678a', label='ELA')
math_patch = mpatches.Patch(color='#00b3b3', label='Mathematics')
plt.title('Percentage of schools with higher than average students not meeing standards')
plt.legend(handles=[ela_patch, math_patch],bbox_to_anchor=(1.35, 1), frameon = False )
ax = plt.gca()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.show()
From the above we see that when grouped by grade, test, and type, less than half of all schools have greater than the average percentage of students whose scores do not meet standards.
above_avg_query =""" SELECT c.grade
, c.test_id
, e.type_id
, a.avg_percent_std_not_met
, COUNT(*) num_percent_std_met_above_grter_avg
FROM caasp c
INNER JOIN entities e ON c.school_code = e.school_code
AND c.county_code = e.county_code
AND c.district_code = e.district_code
INNER JOIN averages a ON c.grade = a.grade AND c.test_id = a.test_id
AND e.type_id = a.type_id
WHERE c.grade IN (6, 7, 8) AND e.type_id IN (7, 9, 10)
AND c.percentage_standard_met_and_above > a.avg_percentage_standard_met_and_above
AND c.school_code > 0
GROUP BY e.type_id, c.grade, c.test_id
ORDER BY c.grade, c.test_id;
"""
stds_met_above = pd.read_sql_query(above_avg_query, conn)
stds_met_above['total_schools'] = middle_test_avgs.total_schools
stds_met_above['percentage_greater_than_avg'] = (stds_met_above.num_percent_std_met_above_grter_avg/
stds_met_above.total_schools)
plt.figure(figsize =(8,6))
plt.bar(range(avg_diffs_from_pub.shape[0]), stds_met_above.percentage_greater_than_avg, color = colors)
plt.xticks(range(avg_diffs_from_pub.shape[0]),label_names, rotation = 'vertical')
#create legend
ela_patch = mpatches.Patch(color='#5c678a', label='ELA')
math_patch = mpatches.Patch(color='#00b3b3', label='Mathematics')
plt.title('Percentage of schools with higher than average students performing above average')
plt.legend(handles=[ela_patch, math_patch],bbox_to_anchor=(1.35, 1), frameon = False )
ax = plt.gca()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.show()
We also see that most charter schools have less than the average percentage of students whose score at or above standards.
avg_not_met_and_met = middle_test_avgs[['grade', 'type_id', 'test_id','AVG(c.percentage_standard_not_met)',
'AVG(c.percentage_standard_met_and_above)' ]]
avg_not_met_and_met['diff_not_met'] = 0
avg_not_met_and_met['diff_met'] = 0
avg_diffs_from_pub = pd.DataFrame(columns = avg_not_met_and_met.columns)
for _, g in avg_not_met_and_met.groupby(['grade', 'test_id']):
g.diff_not_met = g.iloc[0, 3]
g.diff_met = g.iloc[0, 4]
avg_diffs_from_pub = avg_diffs_from_pub.append(g)
avg_diffs_from_pub.diff_not_met = (avg_diffs_from_pub['AVG(c.percentage_standard_not_met)']
- avg_diffs_from_pub.diff_not_met)
avg_diffs_from_pub.diff_met = (avg_diffs_from_pub['AVG(c.percentage_standard_met_and_above)']
- avg_diffs_from_pub.diff_met)
#absolute differences charter avg percentage - public school avg percentage
avg_diffs_from_pub
Here we see the absolute differences in average percentage of students scoring in the categories standards not met, and standards met and above.
This is calculated charter average percentage - public average percentage for each category.
A negative value in the diff_not_met column means that the charter school had less percentage of students not meeting standards when compared to public schools.
A positive value in the diff_met column means that the charter school had a larger percentage of students on average scoring at or above standard levels than public schools.
#plot standards not met
plt.figure(figsize = (12, 6))
plt.bar(x_values, avg_diffs_from_pub['AVG(c.percentage_standard_not_met)'],
label = label_names, color= colors)
plt.ylabel('mean percentage')
plt.xticks(range(avg_diffs_from_pub.shape[0]),label_names, rotation = 'vertical')
plt.title('Standards not met')
ax = plt.gca()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
#draw lengend and add legend
ela_patch = mpatches.Patch(color='#5c678a', label='ELA')
math_patch = mpatches.Patch(color='#00b3b3', label='Mathematics')
plt.legend(handles=[ela_patch, math_patch],bbox_to_anchor=(1.35, 1), frameon = False )
plt.show()
#plot standards met and above
plt.figure(figsize = (12, 6))
plt.bar(x_values, avg_diffs_from_pub['AVG(c.percentage_standard_met_and_above)'],
label = label_names, color = colors)
plt.ylabel('mean percentage')
plt.xticks(range(avg_diffs_from_pub.shape[0]),label_names, rotation = 'vertical')
plt.title('\n\nStandards met and above')
#clean plots
ax = plt.gca()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
#add legend
plt.legend(handles=[ela_patch, math_patch],bbox_to_anchor=(1.35, 1), frameon = False);
percent_stnds_query = """SELECT c.grade
, c.test_id
, e.type_id
, c.percentage_standard_not_met
, c.percentage_standard_met_and_above
FROM caasp c
INNER JOIN entities e ON c.school_code = e.school_code
AND c.county_code = e.county_code
AND c.district_code = e.district_code
WHERE c.grade IN (6, 7, 8) AND e.type_id IN (7, 9, 10)
AND c.subgroup_id = 1
ORDER BY c.grade, c.test_id;
"""
results = pd.read_sql_query(percent_stnds_query, conn)
results.head()
f, ax = plt.subplots(6, 3, figsize = (15, 25), sharey = True)
x = range(0, 100, 10)
i, j = 0,0
for name, group in results.groupby(['grade', 'test_id', 'type_id']):
res_met = relfreq(group.percentage_standard_met_and_above, numbins = 10)
res_not_met = relfreq(group.percentage_standard_not_met, numbins = 10)
ax[i, j].bar(x, res_met.frequency, width = res_met.binsize, label = 'meeting and above standard',
alpha = .5, color = '#187eba')
ax[i, j].bar(x, res_not_met.frequency, width = res_not_met.binsize, label = 'not meeting standard',
alpha = .7, color = '#f2a771')
ax[i, j].set_xlabel(convert_name(name), fontsize = 13)
ax[i, j].spines['top'].set_visible(False)
ax[i, j].spines['right'].set_visible(False)
if i == 0 and j == 2:
ax[i, j].legend(frameon = False, bbox_to_anchor=(1.5, 1.55), fontsize = 12)
j +=1
if j ==3:
i +=1
j = 0
f.suptitle("\nRelative frequencies of schools with percentage of students in each scoring category", fontsize = 16);
So far we have not separated perfomance into subgroups (gender, income level, ethnicity, etc). However, at this point on average, there seems to be evidence pointing to locally funded charter schools preforming the best out of these three categories. The relative frequenies combined with previous analysis on averages shows that locally funded charters school clearly outpreform in ELA tests, and tend to slightly outpreform in mathematics (though this is less clear). To see if the difference between means is statistically significant, we can preform independent t tests.
def calc_effect_size(s1, s2):
"""calculates effect size for ind sample t test
input: samples 1 and 2
output: effect size
"""
std_est_s1_s2 = np.sqrt(len(s1))*np.sqrt(np.std(s1)**2/len(s1) + np.std(s2)**2/len(s2))
effect_size_s1_s2 = (np.mean(s1) - np.mean(s2))/ std_est_s1_s2
return effect_size_s1_s2
Since we are performing two tests against public schools we will use a bonferroni correction
alpha = .05/2 = .001
"""conduct ttest and ANOVA to compare group means"""
for name, g in results.groupby(['grade', 'test_id']):
not_met_pub = g[g.type_id == 7].percentage_standard_not_met
not_met_pubchart = g[g.type_id == 9].percentage_standard_not_met
not_met_loclchart = g[g.type_id == 10].percentage_standard_not_met
met_pub = g[g.type_id == 7].percentage_standard_met_and_above
met_pubchart = g[g.type_id == 9].percentage_standard_met_and_above
met_loclchart = g[g.type_id == 10].percentage_standard_met_and_above
print(name)
print("\nPercentage Standard Not Met\n")
print("\tPublic vs Local Funded Charter: ", ttest_ind(not_met_pub, not_met_loclchart, equal_var = False))
print("\tEffect size:", calc_effect_size(not_met_loclchart, not_met_pub))
print("\n\tPublic vs Direct Funded Charter: ", ttest_ind(not_met_pub, not_met_pubchart, equal_var = False))
print("\tEffect size:", calc_effect_size(not_met_pubchart, not_met_pub))
print("\n\tANOVA: ", f_oneway(not_met_loclchart, not_met_pub, not_met_pubchart))
print("\n\nPercentage Standard Met and Above:\n")
print("\tPublic vs Local Funded Charter: ", ttest_ind(met_pub, met_loclchart, equal_var = False))
print("\tEffect size:", calc_effect_size(met_loclchart, met_pub))
print("\n\tPublic vs Direct Funded Charter: ", ttest_ind(met_pub, met_pubchart, equal_var = False))
print("\tEffect size:", calc_effect_size(met_pubchart, met_pub))
print("\n\tANOVA: ", f_oneway(met_loclchart, met_pub, met_pubchart), '\n')
No significant evidence that means are different at alpha = .001 for the following
However there is evidence we should reject our null hypothesis that the difference between the population means is zero, at alpha = .001, for the following
The effect size for the independent t-test is very small (<.2) when comparing all public school and direct funded charter school results by grade and test type.
The effect size for the independent t-test is small to medium (approx .2-.5) when comparing public school and locally funded charter school results by grade and test type, except in the case of 8th-grade math where the effect size is very small (<.1).
At this point, we have good evidence that locally funded charter schools consistently outperform public schools with all differences in means being found to be statistically significant except for 8th-grade math. However, there is not much evidence to support that direct funded charter schools perform better than public schools on average.
Locally funded charter schools appear to be a promising alternative for families looking to provide better education to their children without needing to pay for private schooling.
Some potential issues with these findings:
Our findings are fairly tentative at this point, but they have allowed us to rule out investigating directly funded charter schools any further.
We have not separated the above data into subcategories. For instance, can this difference in performance be accounted for by a difference in gender or income level of attending students? It is possible that there are more locally funded charter schools in higher income areas, meaning that the difference in performance can be accounted for by family income. Is it possible that 2017 was just a good year for charter schools? How do other years compare? These are good areas for further analysis.
Curious parents should check out https://www.greatschools.org/ to learn about the great schools in their area and view their CAASPP results.