Skip to contents

It will compare the raw dataset, clean dataset, cleaning log and deletion log. Possible flags are :

  • UUID found in deletion log.

  • No action with different value in new value column.

  • Changes were not applied

  • This survey should be deleted from the clean dataset but it was not deleted

  • This survey should be removed from deletion log as it doesn't exist in the raw data.

  • Duplicated entry with different value, please recheck and keep one

  • Entry missing in cleaning log

  • New value in cleaning log and value in clean dataset not matching

  • Survey missing in the raw data

Usage

review_cleaning(
  raw_dataset,
  raw_dataset_uuid_column = "uuid",
  clean_dataset,
  clean_dataset_uuid_column = "uuid",
  cleaning_log = cleaning_log_only,
  cleaning_log_uuid_column = "uuid",
  cleaning_log_change_type_column = "change_type",
  cleaning_log_question_column = "question",
  cleaning_log_new_value_column = "new_value",
  cleaning_log_old_value_column = "old_value",
  cleaning_log_added_survey_value = "added_survey",
  cleaning_log_no_change_value = c("no_action", "no_change"),
  deletion_log = NULL,
  deletion_log_uuid_column = NULL,
  check_for_deletion_log = T
)

Arguments

raw_dataset

Raw dataset

raw_dataset_uuid_column

uuid column in the raw dataset. Default is "uuid".

clean_dataset

Clean dataset

clean_dataset_uuid_column

uuid column in the raw dataset. Default is "uuid".

cleaning_log

Cleaning log

cleaning_log_uuid_column

uuid column in the raw dataset. Default is "uuid".

cleaning_log_change_type_column

column in cleaning log which specifies which change to be made

cleaning_log_question_column

column in cleaning log which specifies which column to change

cleaning_log_new_value_column

cleaning log column specifying the new correct value

cleaning_log_old_value_column

cleaning log column specifying the old value

cleaning_log_added_survey_value

Value for change type column which defines for new surveys

cleaning_log_no_change_value

Value for change type column which defines for no action needed

deletion_log

deletion log

deletion_log_uuid_column

Unique ID column name of deletion log

check_for_deletion_log

TRUE to flag the removed survey

Value

Discrepancy in cleaning log

Examples

if (FALSE) {
deletion_log <- cleaningtools::cleaningtools_cleaning_log |>
  dplyr::filter(change_type == "remove_survey")
cleaning_log <- cleaningtools::cleaningtools_cleaning_log |>
  dplyr::filter(change_type != "remove_survey")

review_cleaning(
  raw_dataset = cleaningtools::raw_dataset, raw_dataset_uuid_column = "X_uuid",
  clean_dataset = cleaningtools::clean_dataset, clean_dataset_uuid_column = "X_uuid",
  cleaning_log = cleaning_log2, cleaning_log_uuid_column = "X_uuid",
  cleaning_log_question_column = "questions",
  cleaning_log_new_value_column = "new_value",
  cleaning_log_old_value_column = "old_value",
  deletion_log = deletaion_log,
  deletion_log_uuid_column = "X_uuid",
  check_for_deletion_log = T
)
}