Data Validation

convert_to_struct_arr(data, add_exch_local_ev=True)[source]

Converts the 2D ndarray currently used in Python hftbacktest into the structured array that can be used in Rust hftbacktest.

Parameters:
  • data (ndarray) – 2D ndarray to be converted.

  • add_exch_local_ev (bool) – If this is set to True, EXCH_EVENT and LOCAL_EVENT flags will be added to the ‘ev’ event field based on the validity of each timestamp. Set to True only when converting existing data into the new format.

Returns:

Converted structured array.

Return type:

ndarray

correct(data, base_latency, tick_size=None, lot_size=None, err_bound=1e-08, method='separate')[source]

Validates the specified data and automatically corrects negative latency and unordered rows. See validate_data(), correct_local_timestamp(), correct_exch_timestamp(), and correct_exch_timestamp_adjust().

Parameters:
Returns:

Corrected data

Return type:

ndarray[Any, dtype[ScalarType]] | DataFrame

correct_event_order(sorted_exch, sorted_local, add_exch_local_ev)[source]

Corrects exchange timestamps that are reversed by splitting each row into separate events, ordered by both exchange and local timestamps, through duplication. See data for details.

Parameters:
  • sorted_exch (ndarray) – Data sorted by exchange timestamp.

  • sorted_local (ndarray) – Data sorted by local timestamp.

  • add_exch_local_ev (bool) – If this is set to True, EXCH_EVENT and LOCAL_EVENT flags will be added to the event field based on the validity of each timestamp.

Returns:

Adjusted data with corrected exchange timestamps.

Return type:

ndarray

correct_exch_timestamp(data, num_corr)[source]

Corrects exchange timestamps that are reversed by splitting each row into separate events, ordered by both exchange and local timestamps, through duplication. See data for details.

Parameters:
Returns:

Adjusted data with corrected exchange timestamps.

Return type:

ndarray[Any, dtype[ScalarType]] | DataFrame

correct_exch_timestamp_adjust(data)[source]

Corrects reversed exchange timestamps by adjusting the local timestamp value for proper ordering. It sorts the data by exchange timestamp and fixes out-of-order local timestamps by setting their value to the previous value, ensuring correct ordering.

Parameters:

data (ndarray[Any, dtype[ScalarType]] | DataFrame) – Data to be corrected.

Returns:

Adjusted data with corrected exchange timestamps.

Return type:

ndarray[Any, dtype[ScalarType]] | DataFrame

correct_local_timestamp(data, base_latency)[source]

Adjusts the local timestamp if the feed latency is negative by offsetting the maximum negative latency value as follows:

feed_latency = local_timestamp - exch_timestamp
adjusted_local_timestamp = local_timestamp + min(feed_latency, 0) + base_latency
Parameters:
  • data (ndarray[Any, dtype[ScalarType]] | DataFrame) – Data to be corrected.

  • base_latency (float) – Due to discrepancies in system time between the exchange and the local machine, latency may be measured inaccurately, resulting in negative latency values. The conversion process automatically adjusts for positive latency but may still produce zero latency cases. By adding base_latency, more realistic values can be obtained. Unit should be the same as the feed data’s timestamp unit.

Returns:

Adjusted data with corrected timestamps

Return type:

ndarray[Any, dtype[ScalarType]] | DataFrame

validate_data(data, tick_size=None, lot_size=None, err_bound=1e-08)[source]

Validates the specified data for the following aspects, excluding user events. Validation results will be printed out:

  • Ensures data’s price aligns with tick_size.

  • Ensures data’s quantity aligns with lot_size.

  • Ensures data’s local timestamp is ordered.

  • Ensures data’s exchange timestamp is ordered.

Parameters:
  • data (ndarray[Any, dtype[ScalarType]] | DataFrame) – Data to be validated.

  • tick_size (float | None) – Minimum price increment for the given asset.

  • lot_size (float | None) – Minimum order quantity for the given asset.

  • err_bound (float) – Error bound used to verify if the specified tick_size or lot_size aligns with the price and quantity.

Returns:

The number of rows with reversed exchange timestamps.

Return type:

int