Order Latency Data

To obtain more realistic backtesting results, accounting for latencies is crucial. Therefore, it’s important to collect both feed data and order data with timestamps to measure your order latency. The best approach is to gather your own order latencies. You can collect order latency based on your live trading or by regularly submitting orders at a price that cannot be filled and then canceling them for recording purposes. However, if you don’t have access to them or want to establish a target, you will need to artificially generate order latency. You can model this latency based on factors such as feed latency, trade volume, and the number of events. In this guide, we will demonstrate a simple method to generate order latency from feed latency using a multiplier and offset for adjustment.

First, loads the feed data.

[1]:
import numpy as np

data = np.load('btcusdt_20200201.npz')['data']
data
[1]:
array([(3758096386, 1580515202342000000, 1580515202497052000, 9364.51, 1.197, 0, 0, 0.),
       (3758096386, 1580515202342000000, 1580515202497346000, 9365.67, 0.02 , 0, 0, 0.),
       (3758096386, 1580515202342000000, 1580515202497352000, 9365.86, 0.01 , 0, 0, 0.),
       ...,
       (3489660929, 1580601599836000000, 1580601599962961000, 9351.47, 3.914, 0, 0, 0.),
       (3489660929, 1580601599836000000, 1580601599963461000, 9397.78, 0.1  , 0, 0, 0.),
       (3489660929, 1580601599848000000, 1580601599973647000, 9348.14, 3.98 , 0, 0, 0.)],
      dtype=[('ev', '<i8'), ('exch_ts', '<i8'), ('local_ts', '<i8'), ('px', '<f8'), ('qty', '<f8'), ('order_id', '<u8'), ('ival', '<i8'), ('fval', '<f8')])

For easy manipulation, converts it into a DataFrame.

[2]:
import polars as pl

df = pl.DataFrame(data)
df
[2]:
shape: (27_532_602, 8)
evexch_tslocal_tspxqtyorder_idivalfval
i64i64i64f64f64u64i64f64
3758096386158051520234200000015805152024970520009364.511.197000.0
3758096386158051520234200000015805152024973460009365.670.02000.0
3758096386158051520234200000015805152024973520009365.860.01000.0
3758096386158051520234200000015805152024973570009366.360.002000.0
3758096386158051520234200000015805152024973630009366.360.003000.0
3489660929158060159981200000015806015999444040009397.790.0000.0
3489660929158060159982600000015806015999521760009354.84.07000.0
3489660929158060159983600000015806015999629610009351.473.914000.0
3489660929158060159983600000015806015999634610009397.780.1000.0
3489660929158060159984800000015806015999736470009348.143.98000.0

Selects only the events that have both a valid exchange timestamp and a valid local timestamp to get feed latency.

[3]:
from hftbacktest import EXCH_EVENT, LOCAL_EVENT

df = df.filter((pl.col('ev') & EXCH_EVENT == EXCH_EVENT) & (pl.col('ev') & LOCAL_EVENT == LOCAL_EVENT))

Reduces the number of rows by resampling to approximately 1-second intervals.

[4]:
df = df.with_columns(
    pl.col('local_ts').alias('ts')
).group_by_dynamic(
    'ts', every='1000000000i'
).agg(
    pl.col('exch_ts').last(),
    pl.col('local_ts').last()
).drop('ts')

df
[4]:
shape: (86_394, 2)
exch_tslocal_ts
i64i64
15805152028430000001580515202979365000
15805152035510000001580515203943566000
15805152037890000001580515204875639000
15805152041270000001580515205962135000
15805152047380000001580515206983780000
15806015958690000001580601595997115000
15806015968650000001580601596994060000
15806015978640000001580601597987786000
15806015988700000001580601598997068000
15806015998480000001580601599973647000

Converts back to the structured NumPy array.

[5]:
data = df.to_numpy(structured=True)
data
[5]:
array([(1580515202843000000, 1580515202979365000),
       (1580515203551000000, 1580515203943566000),
       (1580515203789000000, 1580515204875639000), ...,
       (1580601597864000000, 1580601597987786000),
       (1580601598870000000, 1580601598997068000),
       (1580601599848000000, 1580601599973647000)],
      dtype=[('exch_ts', '<i8'), ('local_ts', '<i8')])

Generates order latency. Order latency consists of two components: the latency until the order request reaches the exchange’s matching engine and the latency until the response arrives backto the localy. Order latency is not the same as feed latency and does not need to be proportional to feed latency. However, for simplicity, we model order latency to be proportional to feed latency using a multiplier and offset.

[6]:
mul_entry = 4
offset_entry = 0

mul_resp = 3
offset_resp = 0

order_latency = np.zeros(len(data), dtype=[('req_ts', 'i8'), ('exch_ts', 'i8'), ('resp_ts', 'i8'), ('_padding', 'i8')])
for i, (exch_ts, local_ts) in enumerate(data):
    feed_latency = local_ts - exch_ts
    order_entry_latency = mul_entry * feed_latency + offset_entry
    order_resp_latency = mul_resp * feed_latency + offset_resp

    req_ts = local_ts
    order_exch_ts = req_ts + order_entry_latency
    resp_ts = order_exch_ts + order_resp_latency

    order_latency[i] = (req_ts, order_exch_ts, resp_ts, 0)

order_latency
[6]:
array([(1580515202979365000, 1580515203524825000, 1580515203933920000, 0),
       (1580515203943566000, 1580515205513830000, 1580515206691528000, 0),
       (1580515204875639000, 1580515209222195000, 1580515212482112000, 0),
       ...,
       (1580601597987786000, 1580601598482930000, 1580601598854288000, 0),
       (1580601598997068000, 1580601599505340000, 1580601599886544000, 0),
       (1580601599973647000, 1580601600476235000, 1580601600853176000, 0)],
      dtype=[('req_ts', '<i8'), ('exch_ts', '<i8'), ('resp_ts', '<i8'), ('_padding', '<i8')])
[7]:
df_order_latency = pl.DataFrame(order_latency)
df_order_latency
[7]:
shape: (86_394, 4)
req_tsexch_tsresp_ts_padding
i64i64i64i64
1580515202979365000158051520352482500015805152039339200000
1580515203943566000158051520551383000015805152066915280000
1580515204875639000158051520922219500015805152124821120000
1580515205962135000158051521330267500015805152188080800000
1580515206983780000158051521596690000015805152227042400000
1580601595997115000158060159650957500015806015968939200000
1580601596994060000158060159751030000015806015978974800000
1580601597987786000158060159848293000015806015988542880000
1580601598997068000158060159950534000015806015998865440000
1580601599973647000158060160047623500015806016008531760000

Checks if latency has invalid negative values.

[8]:
order_entry_latency = df_order_latency['exch_ts'] - df_order_latency['req_ts']
order_resp_latency = df_order_latency['resp_ts'] - df_order_latency['exch_ts']
[9]:
(order_entry_latency <= 0).sum()
[9]:
0
[10]:
(order_resp_latency <= 0).sum()
[10]:
0

Here, we wrap the entire process into a method with njit for increased speed.

[11]:
import numpy as np
from numba import njit
import polars as pl
from hftbacktest import LOCAL_EVENT, EXCH_EVENT

@njit
def generate_order_latency_nb(data, order_latency, mul_entry, offset_entry, mul_resp, offset_resp):
    for i in range(len(data)):
        exch_ts = data[i].exch_ts
        local_ts = data[i].local_ts
        feed_latency = local_ts - exch_ts
        order_entry_latency = mul_entry * feed_latency + offset_entry
        order_resp_latency = mul_resp * feed_latency + offset_resp

        req_ts = local_ts
        order_exch_ts = req_ts + order_entry_latency
        resp_ts = order_exch_ts + order_resp_latency

        order_latency[i].req_ts = req_ts
        order_latency[i].exch_ts = order_exch_ts
        order_latency[i].resp_ts = resp_ts

def generate_order_latency(feed_file, output_file = None, mul_entry = 1, offset_entry = 0, mul_resp = 1, offset_resp = 0):
    data = np.load(feed_file)['data']
    df = pl.DataFrame(data)

    df = df.filter(
        (pl.col('ev') & EXCH_EVENT == EXCH_EVENT) & (pl.col('ev') & LOCAL_EVENT == LOCAL_EVENT)
    ).with_columns(
        pl.col('local_ts').alias('ts')
    ).group_by_dynamic(
        'ts', every='1000000000i'
    ).agg(
        pl.col('exch_ts').last(),
        pl.col('local_ts').last()
    ).drop('ts')

    data = df.to_numpy(structured=True)

    order_latency = np.zeros(len(data), dtype=[('req_ts', 'i8'), ('exch_ts', 'i8'), ('resp_ts', 'i8'), ('_padding', 'i8')])
    generate_order_latency_nb(data, order_latency, mul_entry, offset_entry, mul_resp, offset_resp)

    if output_file is not None:
        np.savez_compressed(output_file, data=order_latency)

    return order_latency
[12]:
order_latency = generate_order_latency('btcusdt_20200201.npz', output_file='feed_latency_20200201.npz', mul_entry=4, mul_resp=3)