
Analysis_Example
Analysis_Example.Rmd
Introduction
In this document, we will demonstrate how to use the Survey-Embedded Carbon Footprint Calculator (SECFC) package. We will:
- Install the SECFC package from GitHub.
- Load SECFC and our sample dataset.
- Calculate carbon emissions using a built-in SECFC function.
- Run a linear regression to see how income predicts total carbon emissions.
- Create a plot of the data using ggplot2.
Step 1: Download the package from GitHub
First, we need to install the package from GitHub. If you do not have
remotes
installed yet, run
install.packages("remotes")
beforehand.
# Install the SECFC package from GitHub
remotes::install_github("jianing-d/SECFC")
#> * checking for file ‘/tmp/RtmpmNbbJm/remotes1ecc44ad4aa3/jianing-d-SECFC-ddbce20/DESCRIPTION’ ... OK
#> * preparing ‘SECFC’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * building ‘SECFC_0.0.5.tar.gz’
Step 2: Load the package
Load the SECFC package, which provides the
calc_total_emissions()
function among others.
Step 3: Load your dataset
For demonstration, we assume you are use questionnaire_sample.rds that contains survey responses (provided and pre-loaded in our package).
# Replace the file path with your own dataset if necessary
questionnaire_example <- SECFC::questionnaire_example
# Take a quick look at your data structure
str(questionnaire_example)
#> tibble [50 × 35] (S3: tbl_df/tbl/data.frame)
#> $ RecordedDate : chr [1:50] "30/6/2024 22:38" "30/6/2024 22:39" "30/6/2024 22:40" "30/6/2024 22:40" ...
#> $ T_01_CarUsage : num [1:50] 4 3 3 4 5 4 5 4 3 3 ...
#> $ T_02_CarType : num [1:50] 3 3 2 3 3 3 3 3 2 3 ...
#> $ T_03_CarDistance : num [1:50] 2 1 3 2 2 3 4 1 4 2 ...
#> $ T_04_PublicTransport : num [1:50] 5 4 5 4 4 5 5 5 5 4 ...
#> $ T_05_PublicTransport : num [1:50] 4 4 4 2 4 4 4 4 4 1 ...
#> $ T_06_AirTravelLong : num [1:50] 2 1 1 1 1 4 2 1 1 1 ...
#> $ T_07_AirTravelShort : chr [1:50] "7-10 flights" "More than 10 flights" "4-6 flights" "More than 10 flights" ...
#> $ T_08_LongDistanceTra : num [1:50] 5 4 5 4 4 5 5 5 5 5 ...
#> $ PETS_4 : num [1:50] 0 0 4 1 5 1 1 0 0 1 ...
#> $ PETS_5 : num [1:50] 2 0 2 1 1 1 0 0 0 0 ...
#> $ E1_Electricity Usage : num [1:50] 2 2 2 1 1 1 1 4 1 1 ...
#> $ EH_02_ElectricityBil_1: num [1:50] 300 79 153 73 107 181 80 130 132 27 ...
#> $ EH_05_NaturalGasBill_1: num [1:50] 59 46 0 27 47 20 0 80 0 27 ...
#> $ EH_07_WaterBill : num [1:50] 119 0 35 0 17 25 40 0 80 150 ...
#> $ F_01_DietaryHabits_5 : num [1:50] 12 8 6 9 10 1 7 8 13 8 ...
#> $ F_01_DietaryHabits_6 : num [1:50] 0 0 2 0 4 0 0 0 1 1 ...
#> $ F_01_DietaryHabits_7 : num [1:50] 0 2 0 1 4 0 5 0 5 2 ...
#> $ F_01_DietaryHabits_4 : num [1:50] 5 5 3 7 11 7 2 14 7 8 ...
#> $ CL_01_ClothingPurcha : num [1:50] 3 3 4 3 4 3 4 5 3 3 ...
#> $ CL_03_MonthlyEx_9 : num [1:50] 0 15 200 20 0 0 200 0 0 0 ...
#> $ CL_03_MonthlyEx_10 : num [1:50] 200 100 0 49 0 150 100 100 250 100 ...
#> $ CL_03_MonthlyEx_11 : num [1:50] 200 0 0 58 0 100 0 0 0 0 ...
#> $ CL_03_MonthlyEx_12 : num [1:50] 0 0 0 0 100 100 0 0 0 0 ...
#> $ CL_03_MonthlyEx_13 : num [1:50] 30 25 0 20 100 500 0 0 0 0 ...
#> $ CL_03_MonthlyEx_14 : num [1:50] 50 35 50 0 0 80 50 100 100 100 ...
#> $ CL_03_MonthlyEx_15 : num [1:50] 0 0 35 150 0 610 0 800 0 0 ...
#> $ SD_06_HouseholdSize_17: num [1:50] 4 1 1 2 4 4 1 2 2 1 ...
#> $ SD_06_HouseholdSize_18: num [1:50] 0 0 0 1 0 2 0 2 0 0 ...
#> $ SD_06_HouseholdSize_19: num [1:50] 0 0 0 0 2 2 0 0 0 0 ...
#> $ SD_07_Country : chr [1:50] "United States" "United States" "United States" "United States" ...
#> $ SD_08_ZipCode : num [1:50] 85255 47905 30506 95843 95901 ...
#> $ EH_03_ElectricityBil_1: num [1:50] 3600 948 1836 876 1284 ...
#> $ EH_06_NaturalGasBill_1: num [1:50] 708 552 0 324 564 240 0 960 0 324 ...
#> $ income : num [1:50] 70011 42460 38100 55435 46043 ...
Step 4: Calculate total carbon emissions
Now we use the calc_total_emissions()
function from
SECFC to estimate each respondent’s carbon footprint based on their
survey responses. A new data frame, questionnaire_example_total, is now
available in your R environment. It has several new variables, including
TotalEmissions, the individual respondent’s overall estimated
footprint.
calc_total_emissions(questionnaire_example)
#> # A tibble: 50 × 41
#> RecordedDate T_01_CarUsage T_02_CarType T_03_CarDistance T_04_PublicTransport
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 30/6/2024 2… 5.5 3 10 5
#> 2 30/6/2024 2… 3.5 3 5 4
#> 3 30/6/2024 2… 3.5 2 30.5 5
#> 4 30/6/2024 2… 5.5 3 10 4
#> 5 30/6/2024 2… 7 3 10 4
#> 6 30/6/2024 2… 5.5 3 30.5 5
#> 7 30/6/2024 2… 7 3 51 5
#> 8 30/6/2024 2… 5.5 3 5 5
#> 9 30/6/2024 2… 3.5 2 51 5
#> 10 30/6/2024 2… 3.5 3 10 4
#> # ℹ 40 more rows
#> # ℹ 36 more variables: T_05_PublicTransport <dbl>, T_06_AirTravelLong <dbl>,
#> # T_07_AirTravelShort <dbl>, T_08_LongDistanceTra <dbl>, PETS_4 <dbl>,
#> # PETS_5 <dbl>, `E1_Electricity Usage` <dbl>, EH_02_ElectricityBil_1 <dbl>,
#> # EH_05_NaturalGasBill_1 <dbl>, EH_07_WaterBill <dbl>,
#> # F_01_DietaryHabits_5 <dbl>, F_01_DietaryHabits_6 <dbl>,
#> # F_01_DietaryHabits_7 <dbl>, F_01_DietaryHabits_4 <dbl>, …
# Check the first few rows
head(questionnaire_example_total)
#> # A tibble: 6 × 41
#> RecordedDate T_01_CarUsage T_02_CarType T_03_CarDistance T_04_PublicTransport
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 30/6/2024 22… 5.5 3 10 5
#> 2 30/6/2024 22… 3.5 3 5 4
#> 3 30/6/2024 22… 3.5 2 30.5 5
#> 4 30/6/2024 22… 5.5 3 10 4
#> 5 30/6/2024 22… 7 3 10 4
#> 6 30/6/2024 22… 5.5 3 30.5 5
#> # ℹ 36 more variables: T_05_PublicTransport <dbl>, T_06_AirTravelLong <dbl>,
#> # T_07_AirTravelShort <dbl>, T_08_LongDistanceTra <dbl>, PETS_4 <dbl>,
#> # PETS_5 <dbl>, `E1_Electricity Usage` <dbl>, EH_02_ElectricityBil_1 <dbl>,
#> # EH_05_NaturalGasBill_1 <dbl>, EH_07_WaterBill <dbl>,
#> # F_01_DietaryHabits_5 <dbl>, F_01_DietaryHabits_6 <dbl>,
#> # F_01_DietaryHabits_7 <dbl>, F_01_DietaryHabits_4 <dbl>,
#> # CL_01_ClothingPurcha <dbl>, CL_03_MonthlyEx_9 <dbl>, …
Step 5: Run a linear regression
We will examine how a respondent’s income might predict their total
carbon emissions. This is a basic linear model using the built-in
lm()
function.
model <- lm(TotalEmissions ~ income, data = questionnaire_example_total)
# Display summary statistics of the regression
summary(model)
#>
#> Call:
#> lm(formula = TotalEmissions ~ income, data = questionnaire_example_total)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3558.0 -1805.8 232.4 1330.9 5995.2
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 7.203e+03 7.000e+02 10.290 9.84e-14 ***
#> income 1.680e-02 1.017e-02 1.652 0.105
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2259 on 48 degrees of freedom
#> Multiple R-squared: 0.05377, Adjusted R-squared: 0.03406
#> F-statistic: 2.728 on 1 and 48 DF, p-value: 0.1051
Step 6: Create a Plot
Finally, we can visualize the relationship between income and
TotalEmissions using ggplot2
. The plot below includes:
- Points representing individual respondents
- A linear regression line (and confidence interval)
# Load ggplot2
library(ggplot2)
# Define custom colors
point_color <- "#4A6D8C" # desaturated blue-grey for points
line_color <- "#2C3E50" # deeper blue-grey for the regression line
lm_plot <- ggplot(questionnaire_example_total, aes(x = income, y = TotalEmissions)) +
geom_point(color = point_color, size = 2.8, alpha = 0.8) +
geom_smooth(method = "lm", se = TRUE, color = line_color, linewidth = 1.2) +
labs(
title = "Income and Total Carbon Emissions",
x = "Annual Income (USD)",
y = "Total Emissions (kg CO2e)"
) +
theme_classic(base_size = 14) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5, color = line_color),
axis.title = element_text(color = line_color),
axis.text = element_text(color = "black"),
panel.border = element_rect(color = "black", fill = NA, linewidth = 0.8),
plot.margin = margin(10, 10, 10, 10)
)
# Print the plot
lm_plot