
Analysis_Example
Analysis_Example.Rmd
Introduction
In this document, we will demonstrate how to use the Survey-Embedded Carbon Footprint Calculator (SECFC) package. We will:
- Install the SECFC package from GitHub.
- Load SECFC and our sample dataset.
- Calculate carbon emissions using a built-in SECFC function.
- Run a linear regression to see how income predicts total carbon emissions.
- Create a plot of the data using ggplot2.
Step 1: Download the package from GitHub
First, we need to install the package from GitHub. If you do not have
remotes
installed yet, run
install.packages("remotes")
beforehand.
# Install the SECFC package from GitHub
remotes::install_github("jianing-d/SECFC")
#> xfun (0.51 -> 0.52) [CRAN]
#> * checking for file ‘/tmp/RtmptvQWLP/remotes1dcd7795771a/jianing-d-SECFC-f490b90/DESCRIPTION’ ... OK
#> * preparing ‘SECFC’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * building ‘SECFC_0.0.3.tar.gz’
Step 2: Load the package
Load the SECFC package, which provides the
calc_total_emissions()
function among others.
Step 3: Load your dataset
For demonstration, we assume you are use questionnaire_sample.rds that contains survey responses (provided and pre-loaded in our package).
# Replace the file path with your own dataset if necessary
questionnaire_example <- SECFC::questionnaire_example
# Take a quick look at your data structure
str(questionnaire_example)
#> tibble [50 × 34] (S3: tbl_df/tbl/data.frame)
#> $ RecordedDate : chr [1:50] "30/6/2024 22:38" "30/6/2024 22:39" "30/6/2024 22:40" "30/6/2024 22:40" ...
#> $ T_01_CarUsage : num [1:50] 4 3 3 4 5 4 5 4 3 3 ...
#> $ T_02_CarType : num [1:50] 3 3 2 3 3 3 3 3 2 3 ...
#> $ T_03_CarDistance : num [1:50] 2 1 3 2 2 3 4 1 4 2 ...
#> $ T_04_PublicTransport : num [1:50] 5 4 5 4 4 5 5 5 5 4 ...
#> $ T_05_PublicTransport : num [1:50] 4 4 4 2 4 4 4 4 4 1 ...
#> $ T_06_AirTravelLong : num [1:50] 2 1 1 1 1 4 2 1 1 1 ...
#> $ T_07_AirTravelShort : chr [1:50] "7-10 flights" "More than 10 flights" "4-6 flights" "More than 10 flights" ...
#> $ T_08_LongDistanceTra : num [1:50] 5 4 5 4 4 5 5 5 5 5 ...
#> $ PETS_4 : num [1:50] 0 0 4 1 5 1 1 0 0 1 ...
#> $ PETS_5 : num [1:50] 2 0 2 1 1 1 0 0 0 0 ...
#> $ E1_Electricity Usage : num [1:50] 2 2 2 1 1 1 1 4 1 1 ...
#> $ EH_02_ElectricityBil_1: num [1:50] 300 79 153 73 107 181 80 130 132 27 ...
#> $ EH_05_NaturalGasBill_1: num [1:50] 59 46 0 27 47 20 0 80 0 27 ...
#> $ F_01_DietaryHabits_5 : num [1:50] 12 8 6 9 10 1 7 8 13 8 ...
#> $ F_01_DietaryHabits_6 : num [1:50] 0 0 2 0 4 0 0 0 1 1 ...
#> $ F_01_DietaryHabits_7 : num [1:50] 0 2 0 1 4 0 5 0 5 2 ...
#> $ F_01_DietaryHabits_4 : num [1:50] 5 5 3 7 11 7 2 14 7 8 ...
#> $ CL_01_ClothingPurcha : num [1:50] 3 3 4 3 4 3 4 5 3 3 ...
#> $ CL_03_MonthlyEx_9 : num [1:50] 0 15 200 20 0 0 200 0 0 0 ...
#> $ CL_03_MonthlyEx_10 : num [1:50] 200 100 0 49 0 150 100 100 250 100 ...
#> $ CL_03_MonthlyEx_11 : num [1:50] 200 0 0 58 0 100 0 0 0 0 ...
#> $ CL_03_MonthlyEx_12 : num [1:50] 0 0 0 0 100 100 0 0 0 0 ...
#> $ CL_03_MonthlyEx_13 : num [1:50] 30 25 0 20 100 500 0 0 0 0 ...
#> $ CL_03_MonthlyEx_14 : num [1:50] 50 35 50 0 0 80 50 100 100 100 ...
#> $ CL_03_MonthlyEx_15 : num [1:50] 0 0 35 150 0 610 0 800 0 0 ...
#> $ SD_06_HouseholdSize_17: num [1:50] 4 1 1 2 4 4 1 2 2 1 ...
#> $ SD_06_HouseholdSize_18: num [1:50] 0 0 0 1 0 2 0 2 0 0 ...
#> $ SD_06_HouseholdSize_19: num [1:50] 0 0 0 0 2 2 0 0 0 0 ...
#> $ SD_07_Country : chr [1:50] "United States" "United States" "United States" "United States" ...
#> $ SD_08_ZipCode : num [1:50] 85255 47905 30506 95843 95901 ...
#> $ EH_03_ElectricityBil_1: num [1:50] 3600 948 1836 876 1284 ...
#> $ EH_06_NaturalGasBill_1: num [1:50] 708 552 0 324 564 240 0 960 0 324 ...
#> $ income : num [1:50] 70011 42460 38100 55435 46043 ...
Step 4: Calculate total carbon emissions
Now we use the calc_total_emissions()
function from
SECFC to estimate each respondent’s carbon footprint based on their
survey responses. The returned data frame will include a new column
called TotalEmissions.
carbon_total <- calc_total_emissions(questionnaire_example)
#> [1] 92.367805 98.437531 853.446155 143.903384 64.928111 428.851286
#> [7] 860.479129 165.907811 49.807684 30.053274 66.278545 36.633441
#> [13] 1533.582182 5.976541 26.577335 6.010655 33.738234 1000.567825
#> [19] 20.889453 2.205671 53.732768 52.876651 631.770000 52.228997
#> [25] 138.177364 47.490663 23.226313 62.717858 187.839161 206.237059
#> [31] 441.669633 872.478975 213.444793 68.560063 39.493315 1.102835
#> [37] 14.863682 153.258302 205.155961 7.437304 338.897055 108.876717
#> [43] 339.854057 251.043570 2.205671 78.229376 661.823274 350.434362
#> [49] 23.047411 76.970867
#> [1] 2000.7449 1432.9049 1034.8790 1591.2109 1976.9828 278.7309 1328.3060
#> [8] 1515.3018 2384.2109 1496.9439 1151.5060 2159.1979 2981.1818 771.9849
#> [15] 1873.9439 2422.1028 387.3319 1299.2649 2337.0379 1236.5709 717.1018
#> [22] 1091.2649 1274.2260 1236.5709 2320.0249 1398.4379 1430.1579 1664.4519
#> [29] 1834.5818 2354.0509 2303.0119 1967.9849 2262.4628 848.3460 1830.1849
#> [36] 1168.5190 1807.0790 1820.0109 824.2218 1458.6899 2106.2369 2556.8618
#> [43] 2354.0509 2354.0509 2482.4800 1236.5709 1075.5860 1185.5319 2268.9860
#> [50] 1834.5818
#> [1] 12646.1865 4074.4541 8598.2838 2057.9939 3004.4915 5061.4532
#> [7] 4859.3106 7313.1887 3697.7663 1351.0163 4005.3551 10391.0752
#> [13] 860.8787 2838.2670 3214.1083 4678.0802 4728.1149 6337.9520
#> [19] 221.1884 10842.7621 2837.4855 3303.9434 3421.8111 18149.5396
#> [25] 4897.0090 8121.4577 3698.4610 2392.6138 3366.6200 3442.2310
#> [31] 7815.5112 8715.2947 8531.2194 8732.3160 18149.5396 5152.3063
#> [37] 3019.0126 17485.0281 9692.1211 5856.0455 5904.9503 8266.9653
#> [43] 4616.1039 5304.3871 1961.1258 7332.0468 3173.3211 3651.6724
#> [49] 613.7782 14734.7506
#> [1] 1540 0 2880 1105 2445 1105 335 0 0 335 335 770 0 1340 770
#> [16] 0 670 770 1005 0 1540 335 1105 1340 0 1875 2645 770 0 770
#> [31] 1440 0 335 670 1540 770 0 1105 1105 0 1105 1105 0 1440 0
#> [46] 1540 1105 0 0 1775
#> [1] 11011.232 10180.787 4235.089 10180.338 10180.787 15669.830 8470.178
#> [8] 4235.089 4235.089 10164.269 6776.143 4235.089 18634.392 2133.614
#> [15] 6776.143 6352.634 6776.143 14399.467 4235.089 6776.143 2117.545
#> [22] 10164.269 11027.951 4235.089 4235.647 4235.089 4251.808 2117.599
#> [29] 11027.356 15301.545 4235.089 6792.716 11011.232 4235.089 8470.178
#> [36] 10164.214 8480.032 10164.214 0.000 2117.545 11037.100 4251.663
#> [43] 11027.356 6352.634 2117.545 4235.089 6353.192 18660.261 2118.248
#> [50] 6776.143
#> [1] 17805.891 15786.584 17601.698 13706.451 15168.447 18115.094 15853.274
#> [8] 7744.596 8517.991 13377.282 10331.605 12396.458 24010.035 4961.143
#> [15] 10518.033 9950.267 10231.270 19581.951 7653.314 18857.681 5374.208
#> [22] 13295.382 17460.758 12913.736 8326.186 11616.747 9274.807 5412.307
#> [29] 13891.432 20352.948 9982.873 13990.827 16665.879 14554.311 16417.241
#> [36] 13821.271 11308.312 17613.740 4072.802 9439.717 15768.224 10778.056
#> [43] 16029.313 11723.825 6563.356 9533.905 9830.266 21109.145 4614.875
#> [50] 12918.487
# Check the first few rows
head(carbon_total)
#> # A tibble: 6 × 43
#> RecordedDate T_01_CarUsage T_02_CarType T_03_CarDistance T_04_PublicTransport
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 30/6/2024 22… 5.5 3 10 5
#> 2 30/6/2024 22… 3.5 3 5 4
#> 3 30/6/2024 22… 3.5 2 30.5 5
#> 4 30/6/2024 22… 5.5 3 10 4
#> 5 30/6/2024 22… 7 3 10 4
#> 6 30/6/2024 22… 5.5 3 30.5 5
#> # ℹ 38 more variables: T_05_PublicTransport <dbl>, T_06_AirTravelLong <dbl>,
#> # T_07_AirTravelShort <dbl>, T_08_LongDistanceTra <dbl>, PETS_4 <dbl>,
#> # PETS_5 <dbl>, `E1_Electricity Usage` <dbl>, EH_02_ElectricityBil_1 <dbl>,
#> # EH_05_NaturalGasBill_1 <dbl>, F_01_DietaryHabits_5 <dbl>,
#> # F_01_DietaryHabits_6 <dbl>, F_01_DietaryHabits_7 <dbl>,
#> # F_01_DietaryHabits_4 <dbl>, CL_01_ClothingPurcha <dbl>,
#> # CL_03_MonthlyEx_9 <dbl>, CL_03_MonthlyEx_10 <dbl>, …
Step 5: Run a linear regression
We will examine how a respondent’s income might predict their total
carbon emissions. This is a basic linear model using the built-in
lm()
function.
model <- lm(TotalEmissions ~ income, data = carbon_total)
# Display summary statistics of the regression
summary(model)
#>
#> Call:
#> lm(formula = TotalEmissions ~ income, data = carbon_total)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -8188.6 -2905.5 53.7 3393.7 7988.5
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 9.043e+03 1.330e+03 6.797 1.5e-08 ***
#> income 6.031e-02 1.933e-02 3.120 0.00306 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 4294 on 48 degrees of freedom
#> Multiple R-squared: 0.1686, Adjusted R-squared: 0.1513
#> F-statistic: 9.734 on 1 and 48 DF, p-value: 0.003058
Step 6: Create a Plot
Finally, we can visualize the relationship between income and
TotalEmissions using ggplot2
. The plot below includes:
- Points representing individual respondents
- A linear regression line (and confidence interval)
# Load ggplot2
library(ggplot2)
# Define custom colors
point_color <- "#4A6D8C" # desaturated blue-grey for points
line_color <- "#2C3E50" # deeper blue-grey for the regression line
lm_plot <- ggplot(carbon_total, aes(x = income, y = TotalEmissions)) +
geom_point(color = point_color, size = 2.8, alpha = 0.8) +
geom_smooth(method = "lm", se = TRUE, color = line_color, linewidth = 1.2) +
labs(
title = "Income and Total Carbon Emissions",
x = "Annual Income (USD)",
y = "Total Emissions (kg CO2e)"
) +
theme_classic(base_size = 14) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5, color = line_color),
axis.title = element_text(color = line_color),
axis.text = element_text(color = "black"),
panel.border = element_rect(color = "black", fill = NA, linewidth = 0.8),
plot.margin = margin(10, 10, 10, 10)
)
# Print the plot
lm_plot