Review cleaning
review_cleaning.Rd
It will compare the raw dataset, clean dataset, cleaning log and deletion log. Possible flags are :
UUID found in deletion log.
No action with different value in new value column.
Changes were not applied
This survey should be deleted from the clean dataset but it was not deleted
This survey should be removed from deletion log as it doesn't exist in the raw data.
Duplicated entry with different value, please recheck and keep one
Entry missing in cleaning log
New value in cleaning log and value in clean dataset not matching
Survey missing in the raw data
Usage
review_cleaning(
raw_dataset,
raw_dataset_uuid_column = "uuid",
clean_dataset,
clean_dataset_uuid_column = "uuid",
cleaning_log = cleaning_log_only,
cleaning_log_uuid_column = "uuid",
cleaning_log_change_type_column = "change_type",
cleaning_log_question_column = "question",
cleaning_log_new_value_column = "new_value",
cleaning_log_old_value_column = "old_value",
cleaning_log_added_survey_value = "added_survey",
cleaning_log_no_change_value = c("no_action", "no_change"),
deletion_log = NULL,
deletion_log_uuid_column = NULL,
check_for_deletion_log = T
)
Arguments
- raw_dataset
Raw dataset
- raw_dataset_uuid_column
uuid column in the raw dataset. Default is "uuid".
- clean_dataset
Clean dataset
- clean_dataset_uuid_column
uuid column in the clean dataset. Default is "uuid".
- cleaning_log
Cleaning log
- cleaning_log_uuid_column
uuid column in the cleaning log. Default is "uuid".
- cleaning_log_change_type_column
column in cleaning log which specifies which change to be made
- cleaning_log_question_column
column in cleaning log which specifies which column to change
- cleaning_log_new_value_column
cleaning log column specifying the new correct value
- cleaning_log_old_value_column
cleaning log column specifying the old value
- cleaning_log_added_survey_value
Value for change type column which defines for new surveys
- cleaning_log_no_change_value
Value for change type column which defines for no action needed
- deletion_log
deletion log
- deletion_log_uuid_column
Unique ID column name of deletion log
- check_for_deletion_log
TRUE to flag the removed survey
Examples
if (FALSE) { # \dontrun{
deletion_log <- cleaningtools::cleaningtools_cleaning_log |>
dplyr::filter(change_type == "remove_survey")
cleaning_log <- cleaningtools::cleaningtools_cleaning_log |>
dplyr::filter(change_type != "remove_survey")
review_cleaning(
raw_dataset = cleaningtools::raw_dataset, raw_dataset_uuid_column = "X_uuid",
clean_dataset = cleaningtools::clean_dataset, clean_dataset_uuid_column = "X_uuid",
cleaning_log = cleaning_log2, cleaning_log_uuid_column = "X_uuid",
cleaning_log_question_column = "questions",
cleaning_log_new_value_column = "new_value",
cleaning_log_old_value_column = "old_value",
deletion_log = deletaion_log,
deletion_log_uuid_column = "X_uuid",
check_for_deletion_log = T
)
} # }