Draft:Forming Bars
| It has been suggested that this page be merged with Draft:Bar Construction (Financial Data). (Discuss) Proposed since December 2025. |
| Review waiting, please be patient.
This may take 2–3 weeks or more, since drafts are reviewed in no specific order. There are 639 pending submissions waiting for review.
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Reviewer tools
|
Submission declined on 21 November 2025 by Pythoncoder (talk).
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
This draft has been resubmitted and is currently awaiting re-review. |
Forming bars is a set of techniques in quantitative finance that constructs price series ("bars") based on sampling from trading activity rather than at fixed points in calendar time. The basic idea is to generate a bar when a certain event threshold is reached, such as a specified number of trades, a fixed traded volume, or a fixed traded amount of money.[1] These event-based bars aim to reduce the heteroscedasticity and serial correlation often observed in returns sampled at regular time intervals, and to produce data that are more suitable for statistical analysis and machine learning.[1]
Motivation
[edit]Traditional financial time series sample prices at fixed time intervals (for example, one minute or one day). Because trading activity and information arrival are highly uneven over time, these "time bars" can oversample quiet periods and undersample active periods, leading to clusters of volatility and other undesirable statistical properties.[2]
Event-based bars treat the sampling scheme as a process subordinated to trading activity. Instead of taking one observation every h seconds, the sampling clock is advanced only when a particular quantity of trading "information" has been exchanged, such as a number of trades, contracts, or dollars. López de Prado argues that such constructions deliver sequences of returns that are closer to homoscedastic, display weaker serial dependence, and have more consistent information content per observation than time bars.[1][2]
Empirical illustrations on E-mini S&P 500 futures, for example, show that the exponentially weighted average number of tick bars per day can vary strongly over time, volume bars often move inversely with the price level, while dollar bars tend to exhibit relatively stable sampling frequencies.[2]
Construction
[edit]Event-based bars are usually defined on transaction-level data indexed by trades . Let be the transaction price and the traded volume at time . Bars are defined by a sequence of stopping times that mark the end of each bar . Within bar , the usual open–high–low–close (OHLC) values and cumulative volume are computed from the trades with indices .[1]
For time bars the stopping times are equally spaced in calendar time, for example . For event-based bars, the stopping times are defined by thresholds on trading activity:
- For tick bars a bar closes after a fixed number of trades,
- For volume bars a bar closes when cumulative volume since the previous bar reaches a threshold ,
- For dollar bars a bar closes when cumulative traded monetary value exceeds a threshold . Writing for the dollar value of trade ,
Imbalance-based bars generalize these constructions by using signed order flow. Let be the trade direction (for example, using the tick rule) and define signed volume . The tick imbalance over bar is
and the volume imbalance is
In imbalance bar schemes the bar is closed when the absolute imbalance exceeds a time-varying threshold based on its historical expectation, for example
for tick imbalance bars, with analogous conditions using for volume imbalance bars.[2] The parameter controls how aggressively the sampling responds to order-flow imbalances.
Types of bars
[edit]Several bar types have been described in the literature. They differ in the event that triggers the formation of a new bar.
Tick bars
[edit]Tick bars sample the market every time a fixed number of transactions has occurred (for example, every 1,000 trades).[1] Each bar summarizes all trades between two tick thresholds and records fields such as opening price, highest price, lowest price, closing price, and total volume. Compared with time bars, tick bars tend to be more closely aligned with order book activity, although the number of contracts or shares per bar can vary widely.
Volume bars
[edit]A volume bar is a bar where a new observation is created each time a fixed number of shares or contracts is traded.[1] For example, one volume bar might be created for every 10,000 shares traded. Because each bar contains (approximately) the same total trading volume, volume bars can mitigate fluctuations in trading volume that affect time bars, and they establish a more direct link between price changes and traded quantity.
Dollar bars
[edit]Dollar bars (or value bars) are defined by a fixed currency value of traded volume, such as a nominal value of US$100,000 per bar.[1] Because the threshold is expressed in currency units, the number of bars becomes less sensitive to changes in price levels or shares outstanding, and the sampling frequency of dollar bar sequences tends to be more stable over time.[2]
Imbalance and run bars
[edit]In addition to standard tick, volume, and dollar bars, several imbalance-based and run-based bar types have been proposed. In these constructions, a bar is formed when a measure of imbalance in the directed order flow, based on trading direction (buy or sell) and size, deviates significantly from its expected value.[1] Examples include tick imbalance bars, volume imbalance bars, and dollar imbalance bars, as well as run bars that react to sequences of trades on the same side of the market. These bars aim to allocate more observations to periods of greater order flow imbalance, which may be associated with informed trading or significant price fluctuations.[2]
Illustrative example
[edit]A simple numerical example illustrates how event-based bars differ from time bars. Consider the ten trades in the following table, with uneven time spacing and varying trade sizes:
| Trade | Time (hh:mm:ss) | Price | Volume |
|---|---|---|---|
| 1 | 09:30:01 | 100.0 | 50 |
| 2 | 09:30:03 | 100.1 | 75 |
| 3 | 09:30:10 | 99.9 | 200 |
| 4 | 09:31:00 | 100.2 | 150 |
| 5 | 09:32:30 | 100.5 | 300 |
| 6 | 09:33:00 | 100.4 | 100 |
| 7 | 09:34:00 | 100.6 | 250 |
| 8 | 09:36:00 | 100.8 | 400 |
| 9 | 09:40:00 | 101.0 | 500 |
| 10 | 09:45:00 | 100.7 | 600 |
If prices were sampled in 5-minute time bars starting at 09:30, the first bar would cover 09:30–09:35 and contain trades 1–7, the second bar 09:35–09:40 (trade 8), and later bars would contain trades 9 and 10. The number of trades and the traded volume per time bar vary substantially.
By contrast, if 500 shares is chosen as the volume threshold, the same trades would form volume bars as follows:
| Volume bar | Trades included | Total volume | Open | High | Low | Close |
|---|---|---|---|---|---|---|
| 1 | 1–5 | 775 | 100.0 | 100.5 | 99.9 | 100.5 |
| 2 | 6–8 | 750 | 100.4 | 100.8 | 100.4 | 100.8 |
| 3 | 9 | 500 | 101.0 | 101.0 | 101.0 | 101.0 |
| 4 | 10 | 600 | 100.7 | 100.7 | 100.7 | 100.7 |
In this example the number of volume bars is determined purely by cumulative trading volume. Within each bar the traded volume is similar in magnitude, even though the durations and numbers of trades in each bar differ. In larger real-world datasets, plotting the number of bars per day for time bars and volume bars typically shows that event-based constructions concentrate more observations in active periods and fewer in quiet periods.[2]
Illustrative pseudocode
[edit]The following pseudocode, written in Python syntax, shows one way to implement volume bars. It is intended to illustrate the underlying idea rather than prescribe a particular implementation:
def volume_bars(trades, threshold):
"""
trades: iterable of (timestamp, price, volume)
threshold: volume per bar, e.g. 10_000
returns: list of OHLCV tuples
"""
bars = []
cum_vol = 0
open_ = high = low = close = None
for ts, price, vol in trades:
if cum_vol == 0:
# start a new bar
open_ = high = low = price
cum_vol += vol
high = max(high, price)
low = min(low, price)
close = price
if cum_vol >= threshold:
bars.append((ts, open_, high, low, close, cum_vol))
cum_vol = 0 # reset for the next bar
return bars
Conceptually similar algorithms can be written for tick bars (using the number of trades instead of volume), dollar bars (using price × volume), and for imbalance-based bars where the stopping condition depends on signed order flow.[1]
Applications
[edit]Event-based bars are widely discussed in quantitative finance and financial machine learning as a preprocessing step in building trading and risk management models.[1] They can be used to construct features for algorithmic trading strategies, to feed supervised learning models for return prediction, or to define event times in backtesting frameworks.
Subsequent research has explored other resampling schemes and compared event-based sampling with classic time-based sampling in training trading strategies and predictive models, for example in cryptocurrency markets and other high-frequency data sets.[3]
In the same context, López de Prado introduces the ETF trick, a technique for dealing with multi-product or futures series by constructing a synthetic price process that tracks the value of a one-unit investment in an underlying exchange-traded fund or replicating basket.[1] This approach can be used to handle contract rolls, spread inversions, and other discontinuities in futures and spread data before sampling, so that bar formation and event-based sampling operate on a continuous synthetic value series rather than on individual contracts.[1]
ETF trick pseudocode
[edit]The following Python-style pseudocode illustrates one way to implement the ETF trick for rolled futures data, by computing the gaps at each roll and subtracting them from the price fields to obtain a continuous synthetic series:
def getRolledSeries(pathIn, key):
series = pd.read_hdf(pathIn, key='bars/ES_10k')
series['Time'] = pd.to_datetime(series['Time'],
format='%Y%m%d%H%M%S%f')
series = series.set_index('Time')
gaps = rollGaps(series)
for fld in ['Close', 'VWAP']:
series[fld] -= gaps
return series
# -----------------------------------------------------------------
def rollGaps(series, dictio={'Instrument': 'FUT_CUR_GEN_TICKER',
'Open': 'PX_OPEN',
'Close': 'PX_LAST'},
matchEnd=True):
# Compute gaps at each roll, between previous close and next open
rollDates = series[dictio['Instrument']].drop_duplicates(
keep='first'
).index
gaps = series[dictio['Close']] * 0
iloc = list(series.index)
iloc = [iloc.index(i) - 1 for i in rollDates] # index of days prior to roll
gaps.loc[rollDates[1:]] = (
series[dictio['Open']].loc[rollDates[1:]]
- series[dictio['Close']].iloc[iloc[1:]].values
)
gaps = gaps.cumsum()
if matchEnd:
gaps -= gaps.iloc[-1] # roll backward
return gaps
The first function reads a futures time series (for example, E-mini S&P 500 contracts) and subtracts the cumulative roll gaps from fields such as the closing price and volume-weighted average price (VWAP). The second function computes the cumulative gaps at each roll date from the difference between the next contract’s opening price and the previous contract’s closing price.
See also
[edit]References
[edit]- ^ a b c d e f g h i j k l López de Prado, Marcos (2018). Advances in Financial Machine Learning. Hoboken, NJ: Wiley. ISBN 978-1119482086.
- ^ a b c d e f g López de Prado, Marcos (2018). "Advances in Financial Machine Learning: Lecture 3/10". SSRN Electronic Journal. doi:10.2139/ssrn.3257419. SSRN 3257419.
- ^ Borges, Tomé Almeida; Rui Neves (2021). Financial Data Resampling for Machine Learning Based Trading: Application to Cryptocurrency Markets. SpringerBriefs in Applied Sciences and Technology. Springer. doi:10.1007/978-3-030-68379-5. ISBN 978-3-030-68378-8.


- Promotional tone, editorializing and other words to watch
- Vague, generic, and speculative statements extrapolated from similar subjects
- Essay-like writing
- Hallucinations (plausible-sounding, but false information) and non-existent references
- Close paraphrasing
Please address these issues. The best way is usually to read reliable sources and summarize them, instead of using a large language model. See our help page on large language models.