Order Latency Data

To obtain more realistic backtesting results, accounting for latencies is crucial. Therefore, it’s important to collect both feed data and order data with timestamps to measure your order latency. The best approach is to gather your own order latencies. You can collect order latency based on your live trading or by regularly submitting orders at a price that cannot be filled and then canceling them for recording purposes. However, if you don’t have access to them or want to establish a target, you will need to artificially generate order latency. You can model this latency based on factors such as feed latency, trade volume, and the number of events. In this guide, we will demonstrate a simple method to generate order latency from feed latency using a multiplier and offset for adjustment.

This example is written for the HftBacktest implementation in Rust.

First, loads the feed data.

[1]:
import numpy as np

data = np.load('btcusdt_20200201.npz')['data']
data
[1]:
array([(3758096386, 1580515202342000128, 1580515202497051904, 9364.51, 1.197),
       (3758096386, 1580515202342000128, 1580515202497346048, 9365.67, 0.02 ),
       (3758096386, 1580515202342000128, 1580515202497351936, 9365.86, 0.01 ),
       ...,
       (3758096385, 1580601599836000000, 1580601599962960896, 9351.47, 3.914),
       (3489660929, 1580601599836000000, 1580601599963461120, 9397.78, 0.1  ),
       (3758096385, 1580601599848000000, 1580601599973647104, 9348.14, 3.98 )],
      dtype=[('ev', '<i8'), ('exch_ts', '<i8'), ('local_ts', '<i8'), ('px', '<f4'), ('qty', '<f4')])

For easy manipulation, converts it into a DataFrame.

[2]:
import pandas as pd

df = pd.DataFrame(data)
df
[2]:
ev exch_ts local_ts px qty
0 3758096386 1580515202342000128 1580515202497051904 9364.509766 1.197
1 3758096386 1580515202342000128 1580515202497346048 9365.669922 0.020
2 3758096386 1580515202342000128 1580515202497351936 9365.860352 0.010
3 3758096386 1580515202342000128 1580515202497357056 9366.360352 0.002
4 3758096386 1580515202342000128 1580515202497362944 9366.360352 0.003
... ... ... ... ... ...
27532597 3489660929 1580601599812000000 1580601599944403968 9397.790039 0.000
27532598 3758096385 1580601599825999872 1580601599952176128 9354.799805 4.070
27532599 3758096385 1580601599836000000 1580601599962960896 9351.469727 3.914
27532600 3489660929 1580601599836000000 1580601599963461120 9397.780273 0.100
27532601 3758096385 1580601599848000000 1580601599973647104 9348.139648 3.980

27532602 rows × 5 columns

Selects only the events that have both a valid exchange timestamp and a valid local timestamp to get feed latency.

[3]:
from hftbacktest.reader import EXCH_EVENT, LOCAL_EVENT

df = df[(df['ev'] & EXCH_EVENT == EXCH_EVENT) | (df['ev'] & LOCAL_EVENT == LOCAL_EVENT)]

Reduces the number of rows by resampling to approximately 1-second intervals.

[4]:
s = (df['local_ts'] / 1_000_000_000).astype(int)
df = df.groupby(s).last()

Converts back to the structured NumPy array.

[5]:
data = df.to_records(index=False)
data
[5]:
rec.array([(3489660930, 1580515202843000064, 1580515202979365120, 9364.54, 1.   ),
           (3758096385, 1580515203551000064, 1580515203943566080, 9318.45, 0.   ),
           (3489660929, 1580515203788999936, 1580515204875639040, 9370.5 , 0.088),
           ...,
           (3489660929, 1580601597864000000, 1580601597987785984, 9397.47, 0.096),
           (3758096385, 1580601598870000128, 1580601598997068032, 9391.37, 2.   ),
           (3758096385, 1580601599848000000, 1580601599973647104, 9348.14, 3.98 )],
          dtype=[('ev', '<i8'), ('exch_ts', '<i8'), ('local_ts', '<i8'), ('px', '<f4'), ('qty', '<f4')])

Generatse order latency. Order latency consists of two components: the latency until the order request reaches the exchange’s matching engine and the latency until the response arrives backto the localy. Order latency is not the same as feed latency and does not need to be proportional to feed latency. However, for simplicity, we model order latency to be proportional to feed latency using a multiplier and offset.

[6]:
mul_entry = 4
offset_entry = 0

mul_resp = 3
offset_resp = 0

order_latency = np.zeros(len(data), dtype=[('req_timestamp', '<i8'), ('exch_timestamp', '<i8'), ('resp_timestamp', '<i8'), ('_reserved', '<i8')])
for i, (ev, exch_ts, local_ts, _, _) in enumerate(data):
    feed_latency = local_ts - exch_ts
    order_entry_latency = mul_entry * feed_latency + offset_entry
    order_resp_latency = mul_resp * feed_latency + offset_resp

    req_ts = local_ts
    order_exch_ts = req_ts + order_entry_latency
    resp_ts = order_exch_ts + order_resp_latency

    order_latency[i] = (req_ts, order_exch_ts, resp_ts, 0)

order_latency
[6]:
array([(1580515202979365120, 1580515203524825344, 1580515203933920512, 0),
       (1580515203943566080, 1580515205513830144, 1580515206691528192, 0),
       (1580515204875639040, 1580515209222195456, 1580515212482112768, 0),
       ...,
       (1580601597987785984, 1580601598482929920, 1580601598854287872, 0),
       (1580601598997068032, 1580601599505339648, 1580601599886543360, 0),
       (1580601599973647104, 1580601600476235520, 1580601600853176832, 0)],
      dtype=[('req_timestamp', '<i8'), ('exch_timestamp', '<i8'), ('resp_timestamp', '<i8'), ('_reserved', '<i8')])
[7]:
df_order_latency = pd.DataFrame(order_latency)
df_order_latency
[7]:
req_timestamp exch_timestamp resp_timestamp _reserved
0 1580515202979365120 1580515203524825344 1580515203933920512 0
1 1580515203943566080 1580515205513830144 1580515206691528192 0
2 1580515204875639040 1580515209222195456 1580515212482112768 0
3 1580515205962135040 1580515213302674944 1580515218808079872 0
4 1580515206983780096 1580515215966900992 1580515222704241664 0
... ... ... ... ...
86389 1580601595997114880 1580601596509574656 1580601596893919488 0
86390 1580601596994060032 1580601597510300416 1580601597897480704 0
86391 1580601597987785984 1580601598482929920 1580601598854287872 0
86392 1580601598997068032 1580601599505339648 1580601599886543360 0
86393 1580601599973647104 1580601600476235520 1580601600853176832 0

86394 rows × 4 columns

Checks if latency has invalid negative values.

[8]:
order_entry_latency = df_order_latency['exch_timestamp'] - df_order_latency['req_timestamp']
order_resp_latency = df_order_latency['resp_timestamp'] - df_order_latency['exch_timestamp']
[9]:
np.sum(order_entry_latency <= 0)
[9]:
0
[10]:
np.sum(order_resp_latency <= 0)
[10]:
0

Here, we wrap the entire process into a method with njit for increased speed.

[11]:
from numba import njit

@njit
def generate_order_latency_nb(data, order_latency, mul_entry, offset_entry, mul_resp, offset_resp):
    for i in range(len(data)):
        ev = data[i][0]
        exch_ts = data[i][1]
        local_ts = data[i][2]
        feed_latency = local_ts - exch_ts
        order_entry_latency = mul_entry * feed_latency + offset_entry
        order_resp_latency = mul_resp * feed_latency + offset_resp

        req_ts = local_ts
        order_exch_ts = req_ts + order_entry_latency
        resp_ts = order_exch_ts + order_resp_latency

        order_latency[i][0] = req_ts
        order_latency[i][1] = order_exch_ts
        order_latency[i][2] = resp_ts

def generate_order_latency(feed_file, output_file = None, mul_entry = 1, offset_entry = 0, mul_resp = 1, offset_resp = 0):
    data = np.load(feed_file)['data']
    df = pd.DataFrame(data)
    df = df[(df['ev'] & EXCH_EVENT == EXCH_EVENT) | (df['ev'] & LOCAL_EVENT == LOCAL_EVENT)]
    s = (df['local_ts'] / 1_000_000_000).astype(int)
    df = df.groupby(s).last()
    data = df.to_records(index=False)

    order_latency = np.zeros(len(data), dtype=[('req_timestamp', '<i8'), ('exch_timestamp', '<i8'), ('resp_timestamp', '<i8'), ('_reserved', '<i8')])
    generate_order_latency_nb(data, order_latency, mul_entry, offset_entry, mul_resp, offset_resp)

    if output_file is not None:
        np.savez_compressed(output_file, data=order_latency)

    return order_latency
[12]:
order_latency = generate_order_latency('btcusdt_20200201.npz', output_file='latency_20200201.npz', mul_entry = 4, mul_resp = 3)