Center for Public Safety Data Science

Introduction to Data Cleaning

Effective data cleaning is a crucial prerequisite for ensuring the accuracy, validity, completeness, and consistency of your data. Flawed or inconsistent data can lead to erroneous conclusions and compromise the integrity of your analyses. The quality of your results directly hinges on how well you cleanse your data.

This comprehensive two-day course is thoughtfully crafted to provide participants with the necessary skills and knowledge to proficiently utilize Microsoft Excel and Power BI as powerful tools for data cleaning. Through engaging hands-on activities with real-world datasets, attendees will be guided to:

  • Identify and rectify data inconsistencies efficiently.
  • Master the use of basic functions and formulas in Microsoft Excel to streamline the data cleaning process.
  • Leverage the visualization capabilities of Microsoft Power BI to detect and address outliers.

By the end of this course, participants will have gained the expertise needed to prepare their datasets for robust analysis, thus enhancing the reliability and accuracy of their data-driven decision-making processes. Join us to acquire these invaluable data cleaning skills and elevate your data analysis proficiency to new heights.


Course Details

  • Prerequisites:
    • Microsoft Excel (must be installed prior to class)
    • Basic understanding of mathematics and general computer proficiency
    • Familiarity with Microsoft Excel recommended
  • Technology Requirements: Students must bring a Windows laptop (personal or employer-issued). Host agency does not provide devices.
  • Duration: 16 Hours (2 Days)
  • Method: In Person
  • Restrictions: Restricted to Criminal Justice Employees in Full or Light Duty status
  • Disclosures: None
  • Fee: Course is free of charge
  • Cancellation Policy: If fewer than 50% of seats are filled 72 hours before the course start date, the instructor may cancel the course.

Course Objectives

After successfully completing the course, attendees will be able to:

  • Understand data cleaning fundamentals
  • Apply basic Excel functionality
  • Use filters and text-to-columns
  • Create unique identifiers
  • Understand the basics of relational databases
  • Remove duplicates from large datasets
  • Use pivot tables for basic analysis
  • Apply conditional formatting for data validation
  • Utilize common Excel functions:
    • =VLOOKUP()
    • =CONCAT()
    • =IF()
    • =ISBLANK()
    • =TRIM()
    • =INT()
    • =YEARFRAC()

Course Outline

Module 1: Introduction to our data and cleaning the Incidents

  • Introduction and class structure
  • GIGO and best practices
  • Data standardization and validation
  • Removing duplicate records
  • Splitting excessive data
  • Type I and Type II errors
  • 2022 Maryland Homicide by Firearm dataset
  • Expanding columns and inserting new ones
  • Creating a Unique Identifier
  • Using Data → Filter button
  • Formatting Date and Time
  • Using =VLOOKUP() and Text-to-Columns
  • Outlier detection and using =CONCAT()
  • Finding latitude and longitude from address data

Module 2: Cleaning the Victim and Suspect Data

  • Creating a relational database from flat data
  • Duplicate removal
  • Calculating Age from DOB
  • Using =IF() and =ISBLANK() appropriately
  • When to leave cells blank vs labeling unknown
  • Pivot Tables for data inconsistencies
  • Using =TRIM() and =VLOOKUP() again
  • Conditional formatting for validation
  • Overview of data management ethics