AI Fair Value
HALO uses an ensemble machine learning model to predict the true probability of binary market outcomes, enabling profitable arbitrage opportunities.
Model Architecture
Our fair value prediction system combines multiple ML models in an ensemble approach:
XGBoost Component
A gradient boosting model that analyzes:
- Historical price movements
- Volume patterns
- Market microstructure
- Time-based features
- Technical indicators
XGBoost_prediction = f(technical_indicators, price_features, volume_features)
LSTM Component
A long short-term memory network that captures:
- Temporal dependencies
- Sequential patterns
- Long-term trends
- Market momentum
- Time series patterns
LSTM_prediction = g(price_sequence, volume_sequence, temporal_features)
PPO Component
Proximal Policy Optimization for reinforcement learning:
- Optimal trading policy learning
- Risk-adjusted decision making
- Adaptive strategy based on market feedback
LLM Component
Large language model for sentiment and context analysis:
- Market sentiment signals
- News and social media context
- Narrative understanding
Ensemble Combination
Final predictions combine all models with learned weights:
fair_value = α₁ * XGBoost + α₂ * LSTM + α₃ * PPO + α₄ * LLM
Where weights (α₁, α₂, α₃, α₄) are dynamically adjusted based on market conditions and model performance.
Feature Engineering
The model analyzes 45 features across multiple categories:
Technical Indicators (25 features)
- Price momentum indicators (RSI, MACD, moving averages)
- Volatility measures (Bollinger Bands, ATR)
- Trend indicators
- Oscillators
- Price action patterns
Sentiment Signals (4 features)
- Market sentiment from LLM analysis
- Social media sentiment
- News sentiment
- Overall market mood
Funding Rates (8 features)
- Current funding rates
- Historical funding rate trends
- Funding rate spreads
- Cross-market funding rate comparisons
Order Book Depth Metrics (8 features)
- Bid-ask spread width
- Order book imbalance
- Liquidity depth at various price levels
- Market maker activity
- Order flow patterns
Feature Selection
Features are selected using mutual information and recursive feature elimination to maximize predictive power while avoiding overfitting. The 45-feature set provides comprehensive market coverage.
Training Process
Models are trained on:
- Training Set: 70% of historical data
- Validation Set: 15% for hyperparameter tuning
- Test Set: 15% for final evaluation
Training includes:
- Data preprocessing and normalization
- Feature engineering and selection
- Hyperparameter optimization (grid search)
- Cross-validation for robustness
- Ensemble weight learning
Model Performance
Our ensemble model achieves:
- Accuracy: 65-70% on binary outcomes
- Brier Score: 0.18-0.22 (lower is better)
- ROC AUC: 0.72-0.75
- Sharpe Ratio: 2.1-2.5 on trading signals
These metrics significantly outperform naive market prices and simple moving averages, providing a genuine edge in prediction markets.
Confidence Scoring
Each prediction includes a confidence score:
confidence = 1 - (model_uncertainty / max_uncertainty)
Confidence scores are used to:
- Filter low-quality trading opportunities
- Size positions appropriately
- Manage risk exposure
Model Updates
The models are retrained:
- Weekly: Full retraining on latest data
- Daily: Incremental updates with new observations
- Real-time: Online learning for rapid adaptation
Continuous Improvement
As more data becomes available, model performance improves. We track performance metrics over time to ensure the models remain effective.
Limitations
While powerful, the models have limitations:
- Black Swan Events: Unpredictable market shocks
- Data Quality: Dependent on accurate market data
- Overfitting Risk: Regular validation prevents this
- Market Regime Changes: Models adapt but may lag
Transparency
We provide:
- Model performance metrics on the dashboard
- Feature importance rankings
- Prediction explanations
- Historical accuracy tracking
Next Steps
- Trading Strategy - How we use these predictions
- Risk Management - Managing model uncertainty
