Wind Power Forecasting

Wind Power Forecasting
Overview
This project tackles the challenge of short-term wind power prediction using a hybrid approach that combines deep learning and ensemble methods. The pipeline implements two forecasting strategies: (1) Direct LSTM that predicts power output from raw inputs, and (2) Indirect LSTM→Random Forest, where LSTM predicts meteorological features used by a Random Forest to estimate power. The solution includes robust data preprocessing, model training, visual analytics, and performance benchmarking.
Motivation
Accurate wind power forecasting is critical for grid stability and energy planning. This project simulates a real-world scenario of turbine-level forecasting using publicly available data and explores whether hybrid learning models can improve accuracy, robustness, or interpretability over pure deep learning models.
Technical Approach
- Dataset: Hourly wind turbine data (
Active Power
,Wind Speed
,Wind Direction
,Power Curve
) - Preprocessing:
- Resampling to hourly intervals
- Handling missing values with forward/backward filling
- Scaling with
MinMaxScaler
- Two Forecasting Pipelines:
- Direct: LSTM → Power
- Indirect: LSTM → Features → Random Forest → Power
- Modeling:
- LSTM: 2-layer Keras Sequential model with early stopping & learning curves
- RF: 50-tree scikit-learn model using predicted LSTM features
- Evaluation Metrics: MAE, RMSE, R², IA, SDE, MAPE
- Visualization:
- Actual vs. Predicted curves
- Feature distributions and ACF/PACF
- Correlation heatmaps and learning curves
Key Features / Contributions
- End-to-end pipeline for sequential time series forecasting
- Fully modular: config-driven experimentation across approaches
- Resilient evaluation framework with inverse scaling and data windowing
- Automated saving of models, plots, metrics for both pipelines
- Rich set of reusable utilities for scaling, splitting, and sequence generation
Results & Findings
Approach | MAE | RMSE | R² | IA | SDE | MAPE |
---|---|---|---|---|---|---|
Direct LSTM | 231.81 | 375.24 | 0.9261 | 0.9805 | 374.72 | 288.02 |
Indirect (LSTM → RF) | 429.12 | 720.31 | 0.7278 | 0.9170 | 711.10 | 1017.82 |
- Direct LSTM significantly outperforms the hybrid pipeline in all metrics, especially error-based ones (RMSE, MAPE)
- The hybrid method's reduced performance suggests feature drift between predicted features and ground truth inputs for RF
Output Examples: Learning Curves
Direct LSTM | Indirect LSTM |
---|---|
![]() | ![]() |
Output Examples:: Actual vs Predicted (Test Set)
Direct LSTM | Indirect RF |
---|---|
![]() | ![]() |
📌 Insight: The Direct LSTM approach outperforms the Indirect method in all evaluation metrics on the test set, particularly in RMSE and MAPE.
Reflection
This project deepened my understanding of temporal forecasting, especially the effects of model architecture on downstream performance. It also offered hands-on practice with pipeline design, modularization, and interpretable evaluation in time-series modeling. Potential next steps include testing exogenous variables, adding weather forecast integration, or transitioning to probabilistic models.