In the first forecasting lab, we trained an LSTM using a single revenue series.
It worked, but it was also a simplified problem.
Real planning models rarely forecast one business unit in isolation.
Different parts of the business grow at different rates, have different seasonality patterns, and respond differently to market conditions.
For this lab, we wanted to move one step closer to that reality.
Instead of forecasting a single revenue stream, we created five business units:
- Enterprise
- SMB
- APAC
- EMEA
- Public Sector
Each business unit was given its own growth profile, seasonality pattern, and revenue scale.
The goal was to see whether a single model could learn from all five business units at the same time while still understanding that each one behaves differently.
The First Problem
When multiple business units are combined into one model, an immediate problem appears.
Enterprise revenue is much larger than Public Sector revenue.
APAC grows faster than EMEA.
SMB has a different seasonal pattern than Enterprise.
If all the data is simply combined together, the model may struggle to understand which patterns belong to which business unit.
To address that, we introduced a business unit identifier.
Each business unit received a numeric ID that was passed into the model as a static feature.
Instead of treating every revenue series as identical, the model learns that different business units have different characteristics.
That was one of the main reasons for moving away from the simple LSTM used in the previous lab.
Looking Beyond Historical Revenue
Many forecasting examples focus entirely on historical values.
Revenue goes in.
Forecast comes out.
For this experiment, we wanted the model to see information that would already be known before the forecast period begins.
Examples include:
- Month
- Quarter
- Whether the month falls in Q4
These values are available in advance and do not require prediction.
The model receives those future calendar features separately from the historical revenue data.
That makes the forecasting process a little more realistic because planners usually know future calendar periods even if they do not know future revenue.
Building the Forecast Windows
The model uses:
- 12 months of history
- 3 months of forecast horizon
For every forecast, the model receives one year of historical information and predicts the next quarter.
This felt like a reasonable balance.
There is enough history to learn seasonal patterns and enough forecast horizon to support a rolling planning process.
Why We Tried a TFT-Style Architecture
The main experiment in this lab was replacing the simple LSTM structure from Lab 01 with a simplified Temporal Fusion Transformer design.
The model still contains an LSTM encoder.
But additional components were added.
Static business unit information is injected into the encoder.
Future calendar features are processed separately.
An attention layer allows future forecast periods to reference information from the historical sequence.
That was the part I was most interested in testing.
Rather than forcing all information through the final LSTM state, the model can revisit historical patterns through attention when generating forecasts.
A Small Change That Made Sense
Another change from Lab 01 was the loss function.
The first model used Mean Squared Error.
This lab uses Huber Loss.
The reason was practical.
Forecast datasets often contain occasional spikes, unusual periods, or reporting noise.
Huber Loss provides a middle ground between MAE and MSE and is generally less influenced by large individual errors.
For forecasting work, that seemed worth testing.
Evaluating the Forecast
After training, the model was evaluated using standard forecasting metrics:
- MAE
- RMSE
- MAPE
- R²
The code also calculates MAPE separately for:
- Month +1
- Month +2
- Month +3
This was useful because forecast quality often changes as the horizon increases.
A model that performs well one month ahead may behave very differently three months ahead.
Looking at the forecast horizon separately gives a better picture of model behavior.
Forecasting the Next Quarter
The final step was generating forecasts for each business unit.
The model takes the latest 12 months of history for each business unit, combines it with future calendar information, and predicts the next three months of revenue.
The output includes:
- Monthly forecast values
- Quarterly totals
- Forecast comparison by business unit
That starts to look more like a planning deliverable rather than a machine learning experiment.
What I Found Interesting
The most interesting part of this lab was not the transformer architecture.
It was the combination of different information types.
The model was learning from:
- Historical revenue
- Business unit identity
- Future calendar information
Those are all things planners already use when building forecasts.
The difference is that the model learns the relationships directly from the data.
Where This Could Go Next
This lab still uses synthetic revenue data.
The next logical step would be introducing business drivers.
Examples could include:
- Headcount
- Pipeline
- Bookings
- Marketing spend
- Customer counts
Those drivers could be passed into the future feature set alongside the calendar information.
That would move the model closer to how rolling forecasts are built in real planning environments.
For now, the goal was simpler.
Move beyond a single time series.
Introduce business context.
Add future information.
Then see how the forecast changes.
That made this lab much more interesting than simply replacing one neural network with another.