Draft:Forming Bars

Review waiting, please be patient.

This may take 2–3 weeks or more, since drafts are reviewed in no specific order. There are 639 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Forming Bars (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 24 days ago by ~2025-36346-97 (talk: D · +) · Last edited 5 days ago by Vodnir

Submission declined on 21 November 2025 by Pythoncoder (talk).

Your draft shows signs of having been generated by a large language model, such as ChatGPT. Wikipedia guidelines prohibit the use of LLMs to write articles from scratch. In addition, LLM-generated articles usually have multiple quality issues, to include:

Promotional tone, editorializing and other words to watch
Vague, generic, and speculative statements extrapolated from similar subjects
Essay-like writing
Hallucinations (plausible-sounding, but false information) and non-existent references
Close paraphrasing

Please address these issues. The best way is usually to read reliable sources and summarize them, instead of using a large language model. See our help page on large language models.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by Pythoncoder 28 days ago. Last edited by Vodnir 5 days ago. Reviewer: Inform author.

This draft has been resubmitted and is currently awaiting re-review.

Forming bars is a set of techniques in quantitative finance that constructs price series ("bars") based on sampling from trading activity rather than at fixed points in calendar time. The basic idea is to generate a bar when a certain event threshold is reached, such as a specified number of trades, a fixed traded volume, or a fixed traded amount of money.^[1] These event-based bars aim to reduce the heteroscedasticity and serial correlation often observed in returns sampled at regular time intervals, and to produce data that are more suitable for statistical analysis and machine learning.^[1]

Motivation

Traditional financial time series sample prices at fixed time intervals (for example, one minute or one day). Because trading activity and information arrival are highly uneven over time, these "time bars" can oversample quiet periods and undersample active periods, leading to clusters of volatility and other undesirable statistical properties.^[2]

Event-based bars treat the sampling scheme as a process subordinated to trading activity. Instead of taking one observation every h seconds, the sampling clock is advanced only when a particular quantity of trading "information" has been exchanged, such as a number of trades, contracts, or dollars. López de Prado argues that such constructions deliver sequences of returns that are closer to homoscedastic, display weaker serial dependence, and have more consistent information content per observation than time bars.^[1]^[2]

Empirical illustrations on E-mini S&P 500 futures, for example, show that the exponentially weighted average number of tick bars per day can vary strongly over time, volume bars often move inversely with the price level, while dollar bars tend to exhibit relatively stable sampling frequencies.^[2]

Construction

Event-based bars are usually defined on transaction-level data indexed by trades $t=1,\dots ,N$ . Let $p_{t}$ be the transaction price and $v_{t}$ the traded volume at time $t$ . Bars are defined by a sequence of stopping times $\tau _{0}<\tau _{1}<\tau _{2}<\dots$ that mark the end of each bar $i$ . Within bar $i$ , the usual open–high–low–close (OHLC) values and cumulative volume are computed from the trades with indices $t=\tau _{i-1}+1,\dots ,\tau _{i}$ .^[1]

For time bars the stopping times are equally spaced in calendar time, for example $\tau _{i}-\tau _{i-1}=\theta _{\mathrm {time} }$ . For event-based bars, the stopping times are defined by thresholds on trading activity:

For tick bars a bar closes after a fixed number $\theta _{\mathrm {T} }$ of trades,

\tau _{i}=\tau _{i-1}+\theta _{\mathrm {T} },\qquad i=1,2,\dots

For volume bars a bar closes when cumulative volume since the previous bar reaches a threshold $\theta _{\mathrm {V} }$ ,

\sum _{t=\tau _{i-1}+1}^{\tau _{i}}v_{t}\geq \theta _{\mathrm {V} }

For dollar bars a bar closes when cumulative traded monetary value exceeds a threshold $\theta _{\mathrm {D} }$ . Writing $x_{t}=p_{t}v_{t}$ for the dollar value of trade $t$ ,

\sum _{t=\tau _{i-1}+1}^{\tau _{i}}x_{t}\geq \theta _{\mathrm {D} }

Imbalance-based bars generalize these constructions by using signed order flow. Let $b_{t}\in \{+1,-1\}$ be the trade direction (for example, using the tick rule) and define signed volume $q_{t}=b_{t}v_{t}$ . The tick imbalance over bar $i$ is

B_{i}=\sum _{t=\tau _{i-1}+1}^{\tau _{i}}b_{t}

and the volume imbalance is

Q_{i}=\sum _{t=\tau _{i-1}+1}^{\tau _{i}}q_{t}.

In imbalance bar schemes the bar is closed when the absolute imbalance exceeds a time-varying threshold based on its historical expectation, for example

{\bigl |}B_{i}{\bigr |}\geq \theta _{I,i}=\alpha \,\mathbb {E} \!\left[{\bigl |}B_{i}{\bigr |}\right]

for tick imbalance bars, with analogous conditions using $Q_{i}$ for volume imbalance bars.^[2] The parameter $\alpha >0$ controls how aggressively the sampling responds to order-flow imbalances.

Types of bars

Several bar types have been described in the literature. They differ in the event that triggers the formation of a new bar.

Tick bars

Tick bars sample the market every time a fixed number of transactions has occurred (for example, every 1,000 trades).^[1] Each bar summarizes all trades between two tick thresholds and records fields such as opening price, highest price, lowest price, closing price, and total volume. Compared with time bars, tick bars tend to be more closely aligned with order book activity, although the number of contracts or shares per bar can vary widely.

Volume bars

A volume bar is a bar where a new observation is created each time a fixed number of shares or contracts is traded.^[1] For example, one volume bar might be created for every 10,000 shares traded. Because each bar contains (approximately) the same total trading volume, volume bars can mitigate fluctuations in trading volume that affect time bars, and they establish a more direct link between price changes and traded quantity.

Dollar bars

Dollar bars (or value bars) are defined by a fixed currency value of traded volume, such as a nominal value of US$100,000 per bar.^[1] Because the threshold is expressed in currency units, the number of bars becomes less sensitive to changes in price levels or shares outstanding, and the sampling frequency of dollar bar sequences tends to be more stable over time.^[2]

Imbalance and run bars

In addition to standard tick, volume, and dollar bars, several imbalance-based and run-based bar types have been proposed. In these constructions, a bar is formed when a measure of imbalance in the directed order flow, based on trading direction (buy or sell) and size, deviates significantly from its expected value.^[1] Examples include tick imbalance bars, volume imbalance bars, and dollar imbalance bars, as well as run bars that react to sequences of trades on the same side of the market. These bars aim to allocate more observations to periods of greater order flow imbalance, which may be associated with informed trading or significant price fluctuations.^[2]

Illustrative example

A simple numerical example illustrates how event-based bars differ from time bars. Consider the ten trades in the following table, with uneven time spacing and varying trade sizes:

Trade	Time (hh:mm:ss)	Price	Volume
1	09:30:01	100.0	50
2	09:30:03	100.1	75
3	09:30:10	99.9	200
4	09:31:00	100.2	150
5	09:32:30	100.5	300
6	09:33:00	100.4	100
7	09:34:00	100.6	250
8	09:36:00	100.8	400
9	09:40:00	101.0	500
10	09:45:00	100.7	600

If prices were sampled in 5-minute time bars starting at 09:30, the first bar would cover 09:30–09:35 and contain trades 1–7, the second bar 09:35–09:40 (trade 8), and later bars would contain trades 9 and 10. The number of trades and the traded volume per time bar vary substantially.

By contrast, if 500 shares is chosen as the volume threshold, the same trades would form volume bars as follows:

Volume bar	Trades included	Total volume	Open	High	Low	Close
1	1–5	775	100.0	100.5	99.9	100.5
2	6–8	750	100.4	100.8	100.4	100.8
3	9	500	101.0	101.0	101.0	101.0
4	10	600	100.7	100.7	100.7	100.7

In this example the number of volume bars is determined purely by cumulative trading volume. Within each bar the traded volume is similar in magnitude, even though the durations and numbers of trades in each bar differ. In larger real-world datasets, plotting the number of bars per day for time bars and volume bars typically shows that event-based constructions concentrate more observations in active periods and fewer in quiet periods.^[2]

Illustrative pseudocode

The following pseudocode, written in Python syntax, shows one way to implement volume bars. It is intended to illustrate the underlying idea rather than prescribe a particular implementation:

def volume_bars(trades, threshold):
    """
    trades: iterable of (timestamp, price, volume)
    threshold: volume per bar, e.g. 10_000
    returns: list of OHLCV tuples
    """
    bars = []
    cum_vol = 0
    open_ = high = low = close = None

    for ts, price, vol in trades:
        if cum_vol == 0:
            # start a new bar
            open_ = high = low = price
        cum_vol += vol
        high = max(high, price)
        low = min(low, price)
        close = price

        if cum_vol >= threshold:
            bars.append((ts, open_, high, low, close, cum_vol))
            cum_vol = 0  # reset for the next bar

    return bars

Conceptually similar algorithms can be written for tick bars (using the number of trades instead of volume), dollar bars (using price × volume), and for imbalance-based bars where the stopping condition depends on signed order flow.^[1]

Applications

Event-based bars are widely discussed in quantitative finance and financial machine learning as a preprocessing step in building trading and risk management models.^[1] They can be used to construct features for algorithmic trading strategies, to feed supervised learning models for return prediction, or to define event times in backtesting frameworks.

Subsequent research has explored other resampling schemes and compared event-based sampling with classic time-based sampling in training trading strategies and predictive models, for example in cryptocurrency markets and other high-frequency data sets.^[3]

In the same context, López de Prado introduces the ETF trick, a technique for dealing with multi-product or futures series by constructing a synthetic price process that tracks the value of a one-unit investment in an underlying exchange-traded fund or replicating basket.^[1] This approach can be used to handle contract rolls, spread inversions, and other discontinuities in futures and spread data before sampling, so that bar formation and event-based sampling operate on a continuous synthetic value series rather than on individual contracts.^[1]

ETF trick pseudocode

The following Python-style pseudocode illustrates one way to implement the ETF trick for rolled futures data, by computing the gaps at each roll and subtracting them from the price fields to obtain a continuous synthetic series:

def getRolledSeries(pathIn, key):
    series = pd.read_hdf(pathIn, key='bars/ES_10k')
    series['Time'] = pd.to_datetime(series['Time'],
                                    format='%Y%m%d%H%M%S%f')
    series = series.set_index('Time')
    gaps = rollGaps(series)
    for fld in ['Close', 'VWAP']:
        series[fld] -= gaps
    return series
# -----------------------------------------------------------------
def rollGaps(series, dictio={'Instrument': 'FUT_CUR_GEN_TICKER',
                             'Open': 'PX_OPEN',
                             'Close': 'PX_LAST'},
             matchEnd=True):
    # Compute gaps at each roll, between previous close and next open
    rollDates = series[dictio['Instrument']].drop_duplicates(
        keep='first'
    ).index
    gaps = series[dictio['Close']] * 0
    iloc = list(series.index)
    iloc = [iloc.index(i) - 1 for i in rollDates]  # index of days prior to roll
    gaps.loc[rollDates[1:]] = (
        series[dictio['Open']].loc[rollDates[1:]]
        - series[dictio['Close']].iloc[iloc[1:]].values
    )
    gaps = gaps.cumsum()
    if matchEnd:
        gaps -= gaps.iloc[-1]  # roll backward
    return gaps

The first function reads a futures time series (for example, E-mini S&P 500 contracts) and subtracts the cumulative roll gaps from fields such as the closing price and volume-weighted average price (VWAP). The second function computes the cumulative gaps at each roll date from the difference between the next contract’s opening price and the previous contract’s closing price.

References

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l López de Prado, Marcos (2018). Advances in Financial Machine Learning. Hoboken, NJ: Wiley. ISBN 978-1119482086.
^ ^a ^b ^c ^d ^e ^f ^g López de Prado, Marcos (2018). "Advances in Financial Machine Learning: Lecture 3/10". SSRN Electronic Journal. doi:10.2139/ssrn.3257419. SSRN 3257419.
^ Borges, Tomé Almeida; Rui Neves (2021). Financial Data Resampling for Machine Learning Based Trading: Application to Cryptocurrency Markets. SpringerBriefs in Applied Sciences and Technology. Springer. doi:10.1007/978-3-030-68379-5. ISBN 978-3-030-68378-8.

[LopezPradoBook-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l López de Prado, Marcos (2018). Advances in Financial Machine Learning. Hoboken, NJ: Wiley. ISBN 978-1119482086.

[LopezPradoSSRN-2] ^ ^a ^b ^c ^d ^e ^f ^g López de Prado, Marcos (2018). "Advances in Financial Machine Learning: Lecture 3/10". SSRN Electronic Journal. doi:10.2139/ssrn.3257419. SSRN 3257419.

[BorgesNeves-3] Borges, Tomé Almeida; Rui Neves (2021). Financial Data Resampling for Machine Learning Based Trading: Application to Cryptocurrency Markets. SpringerBriefs in Applied Sciences and Technology. Springer. doi:10.1007/978-3-030-68379-5. ISBN 978-3-030-68378-8.

[1]

[2]

[3]