AI Fair Value

HALO uses an ensemble machine learning model to predict the true probability of binary market outcomes, enabling profitable arbitrage opportunities.

Model Architecture

Our fair value prediction system combines multiple ML models in an ensemble approach:

XGBoost Component

A gradient boosting model that analyzes:

Historical price movements
Volume patterns
Market microstructure
Time-based features
Technical indicators

XGBoost_prediction = f(technical_indicators, price_features, volume_features)

LSTM Component

A long short-term memory network that captures:

Temporal dependencies
Sequential patterns
Long-term trends
Market momentum
Time series patterns

LSTM_prediction = g(price_sequence, volume_sequence, temporal_features)

PPO Component

Proximal Policy Optimization for reinforcement learning:

Optimal trading policy learning
Risk-adjusted decision making
Adaptive strategy based on market feedback

LLM Component

Large language model for sentiment and context analysis:

Market sentiment signals
News and social media context
Narrative understanding

Ensemble Combination

Final predictions combine all models with learned weights:

fair_value = α₁ * XGBoost + α₂ * LSTM + α₃ * PPO + α₄ * LLM

Where weights (α₁, α₂, α₃, α₄) are dynamically adjusted based on market conditions and model performance.

Feature Engineering

The model analyzes 45 features across multiple categories:

Technical Indicators (25 features)

Price momentum indicators (RSI, MACD, moving averages)
Volatility measures (Bollinger Bands, ATR)
Trend indicators
Oscillators
Price action patterns

Sentiment Signals (4 features)

Market sentiment from LLM analysis
Social media sentiment
News sentiment
Overall market mood

Funding Rates (8 features)

Current funding rates
Historical funding rate trends
Funding rate spreads
Cross-market funding rate comparisons

Order Book Depth Metrics (8 features)

Bid-ask spread width
Order book imbalance
Liquidity depth at various price levels
Market maker activity
Order flow patterns

Feature Selection

Features are selected using mutual information and recursive feature elimination to maximize predictive power while avoiding overfitting. The 45-feature set provides comprehensive market coverage.

Training Process

Models are trained on:

Training Set: 70% of historical data
Validation Set: 15% for hyperparameter tuning
Test Set: 15% for final evaluation

Training includes:

Data preprocessing and normalization
Feature engineering and selection
Hyperparameter optimization (grid search)
Cross-validation for robustness
Ensemble weight learning

Model Performance

Our ensemble model achieves:

Accuracy: 65-70% on binary outcomes
Brier Score: 0.18-0.22 (lower is better)
ROC AUC: 0.72-0.75
Sharpe Ratio: 2.1-2.5 on trading signals

These metrics significantly outperform naive market prices and simple moving averages, providing a genuine edge in prediction markets.

Confidence Scoring

Each prediction includes a confidence score:

confidence = 1 - (model_uncertainty / max_uncertainty)

Confidence scores are used to:

Filter low-quality trading opportunities
Size positions appropriately
Manage risk exposure

Model Updates

The models are retrained:

Weekly: Full retraining on latest data
Daily: Incremental updates with new observations
Real-time: Online learning for rapid adaptation

Continuous Improvement

As more data becomes available, model performance improves. We track performance metrics over time to ensure the models remain effective.

Limitations

While powerful, the models have limitations:

Black Swan Events: Unpredictable market shocks
Data Quality: Dependent on accurate market data
Overfitting Risk: Regular validation prevents this
Market Regime Changes: Models adapt but may lag

Transparency

We provide:

Model performance metrics on the dashboard
Feature importance rankings
Prediction explanations
Historical accuracy tracking

Next Steps

Trading Strategy - How we use these predictions
Risk Management - Managing model uncertainty