Create analysis from a design
create_analysis.Rd
If not list of analysis (loa) is provided, the analysis will run on the overall dataset, and all the grouping variables set.
Arguments
- design
Survey design object created with srvyr::as_survey or as_survey_design
- loa
list of analysis: Default is NULL. If provided it will be used to create the analysis.
- group_var
Default is NULL. If provided, it will first create a list of analysis and then will run the analysis. It should be a vector.
- sm_separator
Separator for choice multiple questions. The default is "."
Value
A list with 3 items:
The results table in a long format with the analysis key
The dataset that was used
The list of analysis that was used
Details
The loa should contains the following columns :
analysis_type: analysis type to be perform. At the moment mean, median, prop_select_one, and ratio are available.
analysis_var: analysis variable to be used as string.
group_var: The grouping variable as string. NA if there is no grouping variable. If a combination of grouping variable should be used together it should be 1 string character separated with a ",". i.e. c("admin1", "admin2") and "admin1, admin2" are different.
c("admin1", "admin2") : will perform the analysis grouping once by admin1, and once by admin2
"admin1, admin2" : will perform the analysis grouping once by admin1 and admin2
level: confidence level to be used. If the column does not exists, .95 will be used. It can also include a column level, if not provided .95 will be set as default.
If ratios have to be performed, the loa should include the following columns as well:
analysis_var_numerator analysis_var_denominator numerator_NA_to_0 filter_denominator_0
Examples
create_analysis(
design = srvyr::as_survey(analysistools_MSNA_template_data),
loa = analysistools_MSNA_template_loa,
sm_separator = "/"
)
#> Joining with `by = join_by(admin1)`
#> Joining with `by = join_by(wash_drinkingwatersource)`
#> Joining with `by = join_by(admin1, wash_drinkingwatersource)`
#> $results_table
#> # A tibble: 143 × 13
#> analysis_type analysis_var analysis_var_value group_var group_var_value stat
#> * <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 prop_select_… admin1 admin1a NA NA 0.31
#> 2 prop_select_… admin1 admin1b NA NA 0.27
#> 3 prop_select_… admin1 admin1c NA NA 0.42
#> 4 mean income_v1_s… NA NA NA 20.0
#> 5 median income_v1_s… NA NA NA 20
#> 6 mean expenditure… NA NA NA 20.1
#> 7 median expenditure… NA NA NA 20
#> 8 prop_select_… wash_drinki… borehole_tubewell NA NA 0.04
#> 9 prop_select_… wash_drinki… bottled_water NA NA 0.08
#> 10 prop_select_… wash_drinki… cart_with_tank_dr… NA NA 0.05
#> # ℹ 133 more rows
#> # ℹ 7 more variables: stat_low <dbl>, stat_upp <dbl>, n <dbl>, n_total <dbl>,
#> # n_w <dbl>, n_w_total <dbl>, analysis_key <chr>
#>
#> $dataset
#> # A tibble: 100 × 449
#> instance_name enum_gender hoh respondent_able_to_answer respondent_age
#> <lgl> <chr> <chr> <chr> <dbl>
#> 1 NA male no yes 22
#> 2 NA male no no 20
#> 3 NA female yes yes 20
#> 4 NA female no no 20
#> 5 NA female no yes 21
#> 6 NA female no no 22
#> 7 NA other yes yes 18
#> 8 NA male yes no 20
#> 9 NA female yes yes 22
#> 10 NA other no yes 20
#> # ℹ 90 more rows
#> # ℹ 444 more variables: respondent_gender <chr>, hoh_age <dbl>,
#> # hoh_gender <chr>, hoh_civil_status <chr>, hoh_civil_status_other <chr>,
#> # admin1 <chr>, admin2 <chr>, admin3 <chr>, admin4 <chr>, cluster_id <chr>,
#> # hh_size <dbl>, parent_instance_name <lgl>, person_id <lgl>,
#> # ind_gender <chr>, ind_age <dbl>, ind_relationship_hoh <chr>,
#> # ind_relationship_hoh_other <chr>, ind_pos <lgl>, hh_number_men <dbl>, …
#>
#> $loa
#> analysis_type analysis_var group_var level
#> 1 prop_select_one admin1 <NA> 0.95
#> 2 mean income_v1_salaried_work <NA> 0.95
#> 3 median income_v1_salaried_work <NA> 0.95
#> 4 mean expenditure_debt <NA> 0.95
#> 5 median expenditure_debt <NA> 0.95
#> 6 prop_select_one wash_drinkingwatersource <NA> 0.95
#> 7 prop_select_multiple edu_learning_conditions_reasons_v1 <NA> 0.95
#> 8 mean income_v1_salaried_work admin1 0.95
#> 9 median income_v1_salaried_work admin1 0.95
#> 10 mean expenditure_debt admin1 0.95
#> 11 median expenditure_debt admin1 0.95
#> 12 prop_select_one wash_drinkingwatersource admin1 0.95
#> 13 prop_select_multiple edu_learning_conditions_reasons_v1 admin1 0.95
#>
create_analysis(
design = srvyr::as_survey(analysistools_MSNA_template_data),
loa = analysistools_MSNA_template_loa_with_ratio,
sm_separator = "/"
)
#> Joining with `by = join_by(admin1)`
#> Joining with `by = join_by(wash_drinkingwatersource)`
#> Joining with `by = join_by(admin1, wash_drinkingwatersource)`
#> $results_table
#> # A tibble: 147 × 13
#> analysis_type analysis_var analysis_var_value group_var group_var_value
#> * <chr> <chr> <chr> <chr> <chr>
#> 1 prop_select_one admin1 admin1a NA NA
#> 2 prop_select_one admin1 admin1b NA NA
#> 3 prop_select_one admin1 admin1c NA NA
#> 4 mean income_v1_salar… NA NA NA
#> 5 median income_v1_salar… NA NA NA
#> 6 mean expenditure_debt NA NA NA
#> 7 median expenditure_debt NA NA NA
#> 8 ratio income_v1_salar… NA %/% NA NA NA
#> 9 prop_select_one wash_drinkingwa… borehole_tubewell NA NA
#> 10 prop_select_one wash_drinkingwa… bottled_water NA NA
#> # ℹ 137 more rows
#> # ℹ 8 more variables: stat <dbl>, stat_low <dbl>, stat_upp <dbl>, n <dbl>,
#> # n_total <dbl>, n_w <dbl>, n_w_total <dbl>, analysis_key <chr>
#>
#> $dataset
#> # A tibble: 100 × 449
#> instance_name enum_gender hoh respondent_able_to_answer respondent_age
#> <lgl> <chr> <chr> <chr> <dbl>
#> 1 NA male no yes 22
#> 2 NA male no no 20
#> 3 NA female yes yes 20
#> 4 NA female no no 20
#> 5 NA female no yes 21
#> 6 NA female no no 22
#> 7 NA other yes yes 18
#> 8 NA male yes no 20
#> 9 NA female yes yes 22
#> 10 NA other no yes 20
#> # ℹ 90 more rows
#> # ℹ 444 more variables: respondent_gender <chr>, hoh_age <dbl>,
#> # hoh_gender <chr>, hoh_civil_status <chr>, hoh_civil_status_other <chr>,
#> # admin1 <chr>, admin2 <chr>, admin3 <chr>, admin4 <chr>, cluster_id <chr>,
#> # hh_size <dbl>, parent_instance_name <lgl>, person_id <lgl>,
#> # ind_gender <chr>, ind_age <dbl>, ind_relationship_hoh <chr>,
#> # ind_relationship_hoh_other <chr>, ind_pos <lgl>, hh_number_men <dbl>, …
#>
#> $loa
#> analysis_type analysis_var group_var level
#> 1 prop_select_one admin1 <NA> 0.95
#> 2 mean income_v1_salaried_work <NA> 0.95
#> 3 median income_v1_salaried_work <NA> 0.95
#> 4 mean expenditure_debt <NA> 0.95
#> 5 median expenditure_debt <NA> 0.95
#> 6 ratio <NA> <NA> 0.95
#> 7 prop_select_one wash_drinkingwatersource <NA> 0.95
#> 8 prop_select_multiple edu_learning_conditions_reasons_v1 <NA> 0.95
#> 9 mean income_v1_salaried_work admin1 0.95
#> 10 median income_v1_salaried_work admin1 0.95
#> 11 mean expenditure_debt admin1 0.95
#> 12 median expenditure_debt admin1 0.95
#> 13 ratio <NA> admin1 0.95
#> 14 prop_select_one wash_drinkingwatersource admin1 0.95
#> 15 prop_select_multiple edu_learning_conditions_reasons_v1 admin1 0.95
#> analysis_var_numerator analysis_var_denominator numerator_NA_to_0
#> 1 <NA> <NA> NA
#> 2 <NA> <NA> NA
#> 3 <NA> <NA> NA
#> 4 <NA> <NA> NA
#> 5 <NA> <NA> NA
#> 6 income_v1_salaried_work expenditure_debt TRUE
#> 7 <NA> <NA> NA
#> 8 <NA> <NA> NA
#> 9 <NA> <NA> NA
#> 10 <NA> <NA> NA
#> 11 <NA> <NA> NA
#> 12 <NA> <NA> NA
#> 13 income_v1_salaried_work expenditure_debt TRUE
#> 14 <NA> <NA> NA
#> 15 <NA> <NA> NA
#> filter_denominator_0
#> 1 NA
#> 2 NA
#> 3 NA
#> 4 NA
#> 5 NA
#> 6 TRUE
#> 7 NA
#> 8 NA
#> 9 NA
#> 10 NA
#> 11 NA
#> 12 NA
#> 13 TRUE
#> 14 NA
#> 15 NA
#>
shorter_df <- analysistools_MSNA_template_data[, c(
"admin1",
"admin2",
"expenditure_debt",
"wash_drinkingwatersource"
)]
create_analysis(
design = srvyr::as_survey(shorter_df),
group_var = "admin1"
)
#> Joining with `by = join_by(type)`
#> Joining with `by = join_by(admin1)`
#> Joining with `by = join_by(admin2)`
#> Joining with `by = join_by(wash_drinkingwatersource)`
#> Joining with `by = join_by(admin1, admin2)`
#> Joining with `by = join_by(admin1, wash_drinkingwatersource)`
#> $results_table
#> # A tibble: 87 × 13
#> analysis_type analysis_var analysis_var_value group_var group_var_value stat
#> * <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 prop_select_… admin1 admin1a NA NA 0.31
#> 2 prop_select_… admin1 admin1b NA NA 0.27
#> 3 prop_select_… admin1 admin1c NA NA 0.42
#> 4 prop_select_… admin2 admin2a NA NA 0.3
#> 5 prop_select_… admin2 admin2b NA NA 0.39
#> 6 prop_select_… admin2 admin2c NA NA 0.31
#> 7 mean expenditure… NA NA NA 20.1
#> 8 median expenditure… NA NA NA 20
#> 9 prop_select_… wash_drinki… borehole_tubewell NA NA 0.04
#> 10 prop_select_… wash_drinki… bottled_water NA NA 0.08
#> # ℹ 77 more rows
#> # ℹ 7 more variables: stat_low <dbl>, stat_upp <dbl>, n <int>, n_total <dbl>,
#> # n_w <dbl>, n_w_total <dbl>, analysis_key <chr>
#>
#> $dataset
#> # A tibble: 100 × 4
#> admin1 admin2 expenditure_debt wash_drinkingwatersource
#> <chr> <chr> <dbl> <chr>
#> 1 admin1b admin2a 22 tanker_trucks
#> 2 admin1c admin2b 18 bottled_water
#> 3 admin1c admin2b 18 water_kiosk
#> 4 admin1c admin2b 23 dont_know
#> 5 admin1c admin2a 20 dont_know
#> 6 admin1c admin2b 23 water_kiosk
#> 7 admin1c admin2a 19 dont_know
#> 8 admin1a admin2a 22 bottled_water
#> 9 admin1c admin2c 21 cart_with_tank_drum
#> 10 admin1b admin2b 25 piped_into_compound
#> # ℹ 90 more rows
#>
#> $loa
#> analysis_type analysis_var group_var level
#> 1 prop_select_one admin1 <NA> 0.95
#> 2 prop_select_one admin2 <NA> 0.95
#> 3 mean expenditure_debt <NA> 0.95
#> 4 median expenditure_debt <NA> 0.95
#> 5 prop_select_one wash_drinkingwatersource <NA> 0.95
#> 6 prop_select_one admin2 admin1 0.95
#> 7 mean expenditure_debt admin1 0.95
#> 8 median expenditure_debt admin1 0.95
#> 9 prop_select_one wash_drinkingwatersource admin1 0.95
#>
create_analysis(
design = srvyr::as_survey(shorter_df),
group_var = "admin1, admin2"
)
#> Joining with `by = join_by(type)`
#> Joining with `by = join_by(admin1)`
#> Joining with `by = join_by(admin2)`
#> Joining with `by = join_by(wash_drinkingwatersource)`
#> Joining with `by = join_by(admin1, admin2, wash_drinkingwatersource)`
#> $results_table
#> # A tibble: 117 × 13
#> analysis_type analysis_var analysis_var_value group_var group_var_value stat
#> * <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 prop_select_… admin1 admin1a NA NA 0.31
#> 2 prop_select_… admin1 admin1b NA NA 0.27
#> 3 prop_select_… admin1 admin1c NA NA 0.42
#> 4 prop_select_… admin2 admin2a NA NA 0.3
#> 5 prop_select_… admin2 admin2b NA NA 0.39
#> 6 prop_select_… admin2 admin2c NA NA 0.31
#> 7 mean expenditure… NA NA NA 20.1
#> 8 median expenditure… NA NA NA 20
#> 9 prop_select_… wash_drinki… borehole_tubewell NA NA 0.04
#> 10 prop_select_… wash_drinki… bottled_water NA NA 0.08
#> # ℹ 107 more rows
#> # ℹ 7 more variables: stat_low <dbl>, stat_upp <dbl>, n <int>, n_total <dbl>,
#> # n_w <dbl>, n_w_total <dbl>, analysis_key <chr>
#>
#> $dataset
#> # A tibble: 100 × 4
#> admin1 admin2 expenditure_debt wash_drinkingwatersource
#> <chr> <chr> <dbl> <chr>
#> 1 admin1b admin2a 22 tanker_trucks
#> 2 admin1c admin2b 18 bottled_water
#> 3 admin1c admin2b 18 water_kiosk
#> 4 admin1c admin2b 23 dont_know
#> 5 admin1c admin2a 20 dont_know
#> 6 admin1c admin2b 23 water_kiosk
#> 7 admin1c admin2a 19 dont_know
#> 8 admin1a admin2a 22 bottled_water
#> 9 admin1c admin2c 21 cart_with_tank_drum
#> 10 admin1b admin2b 25 piped_into_compound
#> # ℹ 90 more rows
#>
#> $loa
#> analysis_type analysis_var group_var level
#> 1 prop_select_one admin1 <NA> 0.95
#> 2 prop_select_one admin2 <NA> 0.95
#> 3 mean expenditure_debt <NA> 0.95
#> 4 median expenditure_debt <NA> 0.95
#> 5 prop_select_one wash_drinkingwatersource <NA> 0.95
#> 6 mean expenditure_debt admin1, admin2 0.95
#> 7 median expenditure_debt admin1, admin2 0.95
#> 8 prop_select_one wash_drinkingwatersource admin1, admin2 0.95
#>
create_analysis(
design = srvyr::as_survey(shorter_df),
group_var = c("admin1", "admin2")
)
#> Joining with `by = join_by(type)`
#> Joining with `by = join_by(admin1)`
#> Joining with `by = join_by(admin2)`
#> Joining with `by = join_by(wash_drinkingwatersource)`
#> Joining with `by = join_by(admin1, admin2)`
#> Joining with `by = join_by(admin1, wash_drinkingwatersource)`
#> Joining with `by = join_by(admin2, admin1)`
#> Joining with `by = join_by(admin2, wash_drinkingwatersource)`
#> $results_table
#> # A tibble: 145 × 13
#> analysis_type analysis_var analysis_var_value group_var group_var_value stat
#> * <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 prop_select_… admin1 admin1a NA NA 0.31
#> 2 prop_select_… admin1 admin1b NA NA 0.27
#> 3 prop_select_… admin1 admin1c NA NA 0.42
#> 4 prop_select_… admin2 admin2a NA NA 0.3
#> 5 prop_select_… admin2 admin2b NA NA 0.39
#> 6 prop_select_… admin2 admin2c NA NA 0.31
#> 7 mean expenditure… NA NA NA 20.1
#> 8 median expenditure… NA NA NA 20
#> 9 prop_select_… wash_drinki… borehole_tubewell NA NA 0.04
#> 10 prop_select_… wash_drinki… bottled_water NA NA 0.08
#> # ℹ 135 more rows
#> # ℹ 7 more variables: stat_low <dbl>, stat_upp <dbl>, n <int>, n_total <dbl>,
#> # n_w <dbl>, n_w_total <dbl>, analysis_key <chr>
#>
#> $dataset
#> # A tibble: 100 × 4
#> admin1 admin2 expenditure_debt wash_drinkingwatersource
#> <chr> <chr> <dbl> <chr>
#> 1 admin1b admin2a 22 tanker_trucks
#> 2 admin1c admin2b 18 bottled_water
#> 3 admin1c admin2b 18 water_kiosk
#> 4 admin1c admin2b 23 dont_know
#> 5 admin1c admin2a 20 dont_know
#> 6 admin1c admin2b 23 water_kiosk
#> 7 admin1c admin2a 19 dont_know
#> 8 admin1a admin2a 22 bottled_water
#> 9 admin1c admin2c 21 cart_with_tank_drum
#> 10 admin1b admin2b 25 piped_into_compound
#> # ℹ 90 more rows
#>
#> $loa
#> analysis_type analysis_var group_var level
#> 1 prop_select_one admin1 <NA> 0.95
#> 2 prop_select_one admin2 <NA> 0.95
#> 3 mean expenditure_debt <NA> 0.95
#> 4 median expenditure_debt <NA> 0.95
#> 5 prop_select_one wash_drinkingwatersource <NA> 0.95
#> 6 prop_select_one admin2 admin1 0.95
#> 7 mean expenditure_debt admin1 0.95
#> 8 median expenditure_debt admin1 0.95
#> 9 prop_select_one wash_drinkingwatersource admin1 0.95
#> 10 prop_select_one admin1 admin2 0.95
#> 11 mean expenditure_debt admin2 0.95
#> 12 median expenditure_debt admin2 0.95
#> 13 prop_select_one wash_drinkingwatersource admin2 0.95
#>