Woter AI detection.Hurry - ends Jul 24th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

Industry SOTA, JD.com's first self-developed billion-level time series model revealed

Written by

Jasper Cole

Updated on:July-02nd-2025

Introduction

JD.com's supply chain algorithm team launched the first self-developed billion-level sales forecast time series model TimeHF. The reinforcement learning technology based on human feedback (RLHF) was applied to sales forecasts for the first time, which greatly improved the forecast accuracy by 10% and reduced the forecast uncertainty on the demand side. After comparing with various time series models on internal and external data sets, the time series model performed outstandingly in both model size and effect, and was better than the industry level. Currently, forecasts are output on 20,000 products for automated replenishment, and the forecast accuracy has been greatly improved compared to online.

This article will introduce the technical exploration of JD JD Retail Group Supply Chain Team will share "TimeHF: Industrial Innovation of Supply Chain Time Series Big Model" on April 19 , introducing how to build high-quality and diversified large-scale time series data sets and RLHF solutions for time series big models. Please scan the QR code at the end of the article to participate.

introduction

Time Series Forecasting is a core basic technology in the fields of supply chain management, energy scheduling, financial risk control, etc. With the increase in data scale and business complexity, traditional methods (such as ARIMA, Prophet) and early deep learning models (LSTM, TCN) face the following challenges:

• Insufficient capture of complex patterns: Traditional models rely on linear assumptions or local time series modeling, making it difficult to capture complex spatiotemporal correlations such as long-term dependencies and multivariable coupling.

•Weak zero-sample generalization capability: Most supervised models need to be retrained for a single scenario and cannot adapt to cross-domain migration needs (such as sales forecasting for different categories of goods).

In recent years, the time series prediction adaptation of large language models (LLMs) (such as GPT4TS and TimesFM) has gradually become a hot topic, but no breakthrough results have been achieved in the field of large text models. The main reasons include:

•Low quality of data sets: Existing public time series data sets generally have problems such as limited data volume, strong regularity, and insufficient scalability. Small models can effectively compress their regularities, making it difficult to fully demonstrate the laws of scale, and the models perform generally in complex scenarios.

• Lack of effective RLHF solutions: RLHF technology has played an important role in the development of LLM. Compared with SFT, it can achieve better results with less data, but the commonly used RLHF framework is not suitable for large time series models.

Technical Solution

In response to the above problems, the supply chain algorithm team explored the training of large time series models from 0-1, completed the training of the industry's first billion-level pure time series large model, and achieved SOTA on multiple public data sets.

In terms of data, we introduced a large-scale, high-quality, complex dataset containing 1.5 billion samples, and proposed a time series training set construction paradigm including time series segmentation, data matching, and synthetic dataset construction.

In terms of the model, we proposed a general PCTLM model, which cuts the data in a patch manner, improves the patch projection process to capture information across patches, and trains large models by introducing a grouped attention mechanism with temporal position encoding.

In terms of vertical optimization, we first proposed the RLHF solution for time series prediction models. Since the commonly used RLHF framework is not suitable for large time series models, we developed a reinforcement learning framework TPO suitable for pure large time series models.

2.1 Dataset Construction

Common excellent base models are supported by a large amount of pre-training data, but the current public data sets in the time series field cannot reach the same level. To solve this problem, we mixed JD’s self-marketing time series data, public data sets, and synthetic data, and integrated the basic data through quality filtering, deduplication, diversity sorting, and data matching. The final pre-training data observation point scale used is about 1.5 billion samples, which is also the largest data set in the time series field.

Basic data

• JD.com Dataset: Most of the data comes from JD.com’s sales data. We collected JD.com’s basic sales data for food, clothing and other categories over the past three years, and aggregated them in different dimensions. Multi-category and multi-dimensional data expands the richness and heterogeneity of the data. The data contains approximately 1.2 billion samples.

• Public dataset: For the public dataset, we use the training data collection of Monash Time Series Database and TSLib Database, and expand the training samples by splitting at random time points. The data contains approximately 20 million samples.

•Synthetic Dataset: For synthetic data, we use JD.com dataset and single time series of public datasets to perform simple model prediction as synthetic data. We also generate synthetic data by customizing trend items, seasonal items, and noise items. The data contains approximately 400 million samples.

Data cleaning

•Data labeling: We label each sample in the non-public data set with a time series label, such as time series length, average daily sales, percentage of zero sales, and other statistics that represent time series characteristics, to evaluate sample characteristics from multiple indicators and angles.

•Quality filtering: Evaluate the quality level of each time series in the data set through time series labels, and eliminate data with too short time series length and too discrete sales, ultimately improving the quality level of the data set.

• Deduplication: We randomly group the data, cluster the time series within the group, and randomly retain the first N samples within the class.

• Diversity sorting: Based on the time series labels, we re-sort the overall data to ensure that the data in each batch contains data with different time series characteristics as much as possible.

•Data ratio: We set different data ratios. For example, in terms of data type, we set the proportion of synthetic data to 20%, the proportion of public data sets to 4%, and the proportion of JD data to 76%; in terms of data dimension, we set the proportion of aggregated time series data to 30%, and the proportion of ordinary dimension data to 70%; and some ratio experiments in other dimensions.

2.2 Model Design-PCTLM Model

PCTLM (Patch Convolutional Timeseries Large Model) adopts an (overlapping) patch-based approach to model time series using a mask encoder architecture. We divide the input into patches and project them into vector representations. To this end, we design a network based on convolutional layers to capture information across patches through channels and convolution kernels. In addition, we introduce a grouped attention mechanism with temporal position encoding to achieve time series prediction.

The core Transformer module is an encoder-only architecture that combines various improvements from recent state-of-the-art large language model (LLM) architectures. We use rotational positional encoding (ROPE) to positionally encode the input and adopt the grouped query attention (GQA) mechanism to reduce the overhead of attention computation.

2.3 Training Program - RLHF

In large text models, RL has been widely used in fine-tuning large models, allowing large text models to learn more preferred data and enhance the effect of text models.

Since the input and output of large time series models are continuous values, this is fundamentally different from text processing. In addition, most pure time series models usually use methods such as mean square error (MSE) and quantile loss. There are situations where only multi-step predictions can be made (TD error cannot be performed) and the probability of the predicted value cannot be output (probability and KL divergence cannot be calculated, etc.). Therefore, commonly used reinforcement learning frameworks such as PPO and RLOO cannot be directly applied to scenarios of large time series models (except for time series probability prediction models). In this case, we combined the reinforcement learning framework of large text models and proposed a reinforcement learning framework suitable for pure large time series models - TPO (Timeseries Policy Optimization).

Input: First, we add pairs of good and bad predictions to the original time series input of the RLHF dataset. Our fine-tuning preference is to make the prediction closer to the good prediction.

Prediction probabilistic component: Since most large pure time series models output deterministic predictions without prediction probabilities, it is impossible to calculate the KL divergence and policy probability loss in RL. We have built a common component that is suitable for all time series models (not limited to large models). We use the distribution modeling method to assume that the time series predictions and good and bad predictions obey the normal distribution of N(μ,1). The mean is calculated directly using the mean of the output predictions, so that the prediction probability can be quickly output.

Advantage function: Since our large time series model directly outputs multi-step predictions, we avoid using the TD-ERROR approach in the RL stage. We thus borrow ideas similar to REINFORCE. The gain of the advantage function comes from the difference with the baseline reward. Overall, when the RL model deviates little from the original SFT model, the prediction is closer to the good prediction, and the greater the gain from the baseline reward, the greater the advantage.

Time series loss: In the original RL goal, a pre-training loss is added. This is mainly to maintain the performance on the standard NLP task so that it does not "forget its roots". For large time series models, MSE is one of the best evaluation indicators of time series task performance, so a part of MSE loss is added to improve the ability of time series prediction during fine-tuning. When constructing MSE, we not only look at the gap between the prediction and the good prediction, but also look at the way with the bad prediction. Experiments show that this method can significantly improve the overfitting of model fine-tuning.

Model Effect

On public datasets, we compared the performance of the current fine-tuned time series large model (GPT4TS) with the best performance, and 5 current leading full shot time series deep learning methods (patchtst, autoformer, itransformer, DLinear, Informer). The PCTLM model after SFT+TPO can achieve SOTA results on most public datasets. (The model in bold is the best model, and the evaluation indicator is MAE, the smaller the better)

in conclusion

This paper proposes a new training process for large time series models (PCTLM+SFT+TPO), in which PCTLM is a general basic large time series model. Its largest size is also the first billion-level pure time series large model to date. Its zero-sample performance is better than the most advanced fine-tuned large model GPT4TS in the industry and the accuracy of the fully supervised prediction model on various time series data. In addition, we proposed the RLHF solution for the time series prediction model for the first time. We developed a reinforcement learning framework TPO suitable for pure time series large models. In the time series scenario, this framework is better than the most advanced RLHF framework (PPO, RLOO, etc.) in performance and effect. And due to the excellent performance of the time series large model we proposed, we have deployed the self-developed time series large model on the JD supply chain system and achieved excellent performance. At present, the output prediction of the time series large model is used for automatic replenishment of 20,000 SKUs, and the prediction accuracy is greatly improved compared to the online one.

For more details of the paper, please refer to "TimeHF: Billion-Scale Time Series Models Guided by Human Feedback"

Link: https://arxiv.org/abs/2501.15942

Summit Invitation

On April 19, 2025, the 75th DataFunSummit: Changes and Constants in Data Science in the Era of Big Models will be held in the DataFun online community. The Data Science Summit has been held for five consecutive years and is one of the few professional conferences in China that focuses on data science topics and data science practitioners.

The forum [ Data Science and Supply Chain Optimization ] co-planned and organized by Dr. Qi Yongzhi , head of the supply chain team of JD Retail Group , will be held on the afternoon of April 19. Shi Zhengxin, an inventory algorithm expert of JD Retail , will share " TimeHF: Industrial Innovation of Supply Chain Time Series Big Model " . The speech will detail how to build high-quality and diverse large-scale time series data sets and RLHF solutions for big time series models, including data enhancement, data balancing and diversity ranking, time series strategy optimization (TPO) and other technologies.