Validation Overview

Go back to the Homepage
Version 1.0 Produced 2023-08-10

The 22056 Student collection will not have the same warning and error system as previous years, they have been replaced with tolerance approval ranges instead.

Quality assurance will happen within the context of a collection, whilst also comparing the incoming data with data sent in previous collections to ensure consistency. Issues will be automatically raised in IMS for individual rules and credibility reports.

'Continuity' checks are there to monitor changes between collections. These changes for a data item will be raised as part of the quality assurance process.


Tolerance approvals

Tolerances will allow providers to agree with HESA, or the Statutory Customers as applicable, an acceptable level of records that can trigger a rule over the default.

Tolerance approvals will be managed in the Issue Management System (IMS). Providers can raise an issue in the IMS via the HESA data platform (HDP) using the ‘create issue’ button and will then be able to add an explanation so tolerances can be approved.


Quality rules report

Quality rules are applied to the submitted data within the HESA Data Platform (HDP). The quality rule specifications are linked to the schema and are collection specific for the 22056 and 23056 collections, moving to reference period specific thereafter.

The Quality Rules report will show you all the rules that fall outside of tolerance, along with the rules that fall inside tolerance. Both should be reviewed as part of your quality assurance processes.

There are six categories of quality rules; Data integrity, Valid values, Coverage, Guidance, Continuity, Credibility reports. These are a way to distinguish between different types of rule.

Data integrity – these identify data anomalies e.g. duplicate valid entries returned.

Valid values – ensures coding is applicable to the regulator.

Coverage – enforces the coverage statements for the individual data items in the coding manual.

Guidance – enforce guidance as details in the coding manual.

Continuity/Updateability – ensures the consistency of data across collections, where continuing students are either expected but not yet returned or unexpected students have been returned. Also, including updateability rules that highlight any fields where the valid entries have changed between collections.

Credibility reports - relate to specific credibility reports found on the HDP.


Continuity validation

'Continuity' checks are there to monitor changes between collections. USN (UKPRN + SID + NUMHUS) is the linking mechanism used by HESA to track continuing student engagements between HESA reporting periods. Continuity validation details potential discrepancies with student engagement linking within the inserted data file. 

More details about USN link and continuity in the coding manual.

It is worth nothing that 22056 will see the use of migrated data, which is legacy data translated from the C21051 and C21054 collections, only for the use in quality checks. For rules that use migrated data where linking is normally required on key fields (such as SCSESSIONID) these have been adapted to pseudo-link. As these fields have not previously been returned to HESA a complete match cannot be guaranteed. 

Please see Data Migration in the coding manual for more information.


Credibility reports

These have the same functionality as the current legacy Student collection with the implementation of tabs across the top for full time and part time.

All cells that are highlighted will be reported in the Issue Management System (IMS). These will need to be reviewed and where necessary a tolerance override request made per cell. Providers will need to use the credibility report in HDP in conjunction with the IMS record when reviewing, as the contextual information and drill down will be within HDP rather than IMS.


Online validation tool (OVT)

The 22056 Student collection will have an Online Validation Tool, unlike the previous validation kit which has to be downloaded locally.

This will increase the range of rules that HESA can offer in the Validation Tool as more data can be referenced or used for comparisons, for example previous submitted data.

This will also more closely mirror the process of how providers submit a file to the HDP.


Technical population and technical validity

The technical population defines which records are included in the check. 

The technical validity defines which records in the technical population are valid or invalid. Generally, the record tolerance is based on the outcome of this validity. 

Both the above will use entities, fields, field values, enhanced coding frames, derived fields and historic data.

Derived fields are calculated fields included in the data which can look at data across multiple fields or time, they begin with a Z_

Enhanced coding frames group valid entries into categories for onward use, they usually have GRP_ or MRK within the name, indicating groupings and markers.

Reference data is required for some rules such as postcode look-ups.


Invalid / Valid rules

Validity indicates how the technical validity should be applied and can be either invalid or valid.

INVALID – records identified in the technical validity fail the rule. i.e. not expecting this 

VALID – records identified in the technical validity pass the rule. i.e. this is what we are expecting. Any others within the population, but outside of the technical validity, will trigger the rule.


Intolerable rules

These are rules that cannot have a tolerance override applied. They largely relate to the return of dates e.g. end date is before the start date.


Tolerance thresholds

Tolerance thresholds are set for each rule. It is the record count and/or the percentage of records above which the rule will trigger on the HDP. Rules that are shown as 'outside tolerance' will need to have action taken by the provider; either to amend the data and resubmit, or to submit a tolerance override request. The tolerance override request can then be approved by the Statutory Customer, HESA or self-approved by the provider as applicable.


Tolerance types

RECORD COUNT - The number of records

PERCENTAGE - The count in percentage form

COUNT AND/OR PERCENTAGE - Where both the count and/or percentage are applicable.

BINARY - Describes how the rule works, either on or off. These are similar to how the rules currently work in legacy Student collections, in that the rule is essentially turned off (switched). 

The rule will be outside of tolerance where the count and/or percentage of records in the technical validity is greater than the tolerance count and /or percentage value.

For example, a record count tolerance of 5 will trigger a rule where there are 6 records failing the rule.

For example, a 10% tolerance will trigger a rule where more than 10% of the records in the population fail the technical validity check. 100 records in the population and 11 records failing the technical validity check would be considered outside of tolerance.

When any failed records for a rule fall outside of tolerance then an issue is created in the IMS where a tolerance request can be made and approved either automatically, by HESA or by a Statutory Customer.


Please refer to the below for more information:

HESA Coding Manual Tools

Data Futures: Quality Assurance Process in HESA e-learning 

Go back to the Homepage