Implementing earthmover bundles to load assessment data into Ed-Fi

Introduction

This document covers the steps required to use  earthmover  and  lightbeam  to send assessment data to an Ed-Fi ODS. Before you begin, make sure you have an earthmover bundle for the assessment.
EA maintains a  public GitHub repository  of bundles that are ready to deploy. If we haven't added a bundle for your assessment yet, review  🟢Writing earthmover bundles to integrate assessment data into Ed-Fi  for instructions on writing your own.

Implementation steps:

Review the bundle documentation

Each bundle in the EA-managed repository has a README file with important information for its use. Things to look for include:
  • Version compatibility. Unless otherwise noted, bundles included in the repository have been tested with recent versions of Ed-Fi and recent versions of the file formats being mapped. Sometimes, however, assessments change enough over time that a new version will require an entirely new bundle.
  • Details on the necessary source file(s)
  • Options for mapping student IDs if necessary
  • Required and optional parameters to include when you run the bundle


Check the contents of the results files

Open the results file and verify that it contains only the data you expect.
  • Does it contain data from multiple districts? If so, does the bundle handle splitting the results by district or will you need to do this before processing?
  • Does it contain data from multiple years? We typically want to send each year of assessment results to that year's ODS. If multiple years are included, does the bundle handle filtering down to one year or will you need to do this before processing?
If you open a CSV file with Excel, do not save it in Excel. The automatic changes that Excel makes to the file (modifying date formats, removing leading zeroes, etc.) can cause earthmover errors.


Review configuration for student IDs

Each assessment bundle contains  logic  that will automatically detect the correct student ID configuration to maximize student matching given an Ed-Fi roster source. Be sure to review the default parameters to ensure this logic is applied by default or if the user must pass different parameter values for this logic to apply.


Run earthmover with any necessary parameters

Once you you have worked through steps to configure student IDs, it is time to gather your input sources and  run earthmover  to transform the data and output the json files. In order to successfully run earthmover, you should check the README file of the bundle for additional instructions and parameters that are specific to that assessment. For example, some assessment bundles may require inputting the API year of the destination ODS while others may require the assessment version. As seen in the above section, the syntax is as follows:

earthmover run -c ./earthmover.yaml -p '{
"BUNDLE_DIR": ".",
"INPUT_FILE_OVERALL": "path/to/input_file.csv",
"OUTPUT_DIR": "./output",
"STUDENT_ID_NAME": "student_id_to"}'


Run lightbeam

After you have run earthmover with no errors, use lightbeam to send the data to the ODS. If you have a test or development ODS, you may want to send the data there first so that you can inspect it for issues before loading to your production ODS. You can also use lightbeam's  validate  command to catch errors before attempting to send the data.
lightbeam validate -c path/to/config.yaml

When you're ready to send, double check that you are sending to the correct ODS and use  lightbeam send .
lightbeam send -c path/to/config.yaml


Fixing common errors

    .1Student reference could not be resolved
  • This means that the student ID in the student assessment record was not found in the ODS. If all of your records failed with this error, you are likely using the wrong student ID. This error should not occur if you are utilizing the student ID xwalking logic and the original rostering source matches the rostering information in the destination ODS.
    .2Unable to resolve value to an existing resource
  • This means one of two things:
    .i property of the entity with a reference datatype includes a value that does yet exist in the ODS
    .1E.g. an assessment was referenced in the student assessment entity that has not yet been sent to the ODS
    .iia descriptor is being sent that does not yet exist in the ODS
  • Either way, the way to solve this error is by sending the required values to the ODS, whether that be records of a different entity entirely or descriptor values.


Automate the loading of future results files

For assessments that are taken multiple times per year, it can be helpful to set up an automated process for extracting and loading assessment results. Automation can save time on future data loading and ensure more timely delivery of up-to-date data. You can run the automated process locally using a cron job or on a server using your orchestration tool of choice. EA uses  Airflow  for this because it's open source and already deployed to manage other processes for our data warehouse implementations.

First, you'll need a script for moving the data from the source system to a specified location on the machine that will be running earthmover (either your computer or a server). This will depend on what the vendor offers for accessing reports, such as an SFTP or an API. You may need to reach out to the vendor to get this set up and receive credentials. After you extract the data, it may require some transformation to get it into the format that the earthmover bundle is expecting. Make sure to split multi-district or multi-year extracts if needed. Once your data matches the expected format, run the earthmover and lightbeam commands with any required parameters.

Other things to consider when setting up an automated assessment loading process:
  • It's important to be able to check the logs to see if lightbeam is encountering any errors when sending the records.
  • We recommend outputting those logs to a specific storage location, such as a database table.
  • The script will need to be able to handle using different parameters for each district to account for differences in student ID mapping and to ensure that each district's results are sent to the correct ODS.
  • EA also recommends automating the student ID xwalking feature. The EA data engineering team built this into our  Earthbeam DAG  using the following logic: