The Estimation Model for a S&P 500 ETF
Group Name: Challengers
Group Members:
• Mingshuo Chen, [email protected]
• Amelia Sun, [email protected]
• Ziren Wang, [email protected]
• Quanyi Miao, [email protected]
• Zexin Zheng, [email protected]
Background:
The financial market has always brought tremendous profit for those who are able to predict
everything precisely. As financial instruments investors, we always wish to hedge risks, especially
the risk brought by the change of prices in an ETF. Previously, we have learned many approaches
to hedge this kind of risk, such as technician analysis, present value model, monetary model, et.
However, all of these models are flawed. Hence, we here wish to create a statistical model that
forecasts how the price of an ETF is going to be more accurately. We will use the jump SDE and
diffusion models (which involve Levy processes) to pursue an as-precise-as-possible outcome and
try to benefit from the forecast. Stochastic Differential Equations are essential tools in modeling the
random fluctuations of asset prices. They describe how a system evolves when subjected to random,
unpredictable events. Our model can explain market spikes, thick tails, random jumps, etc. better
than B -S model and is more convincing than the B-S model.
The innovation in our model is the addition of a levy process, Levy processes are stochastic
processes that account for jumps in asset prices, adding it to model sudden, unpredictable events
like market crashes or significant news. This allows our model to more accurately predict
unpredictable events in the future, allowing us to make accurate judgments before the event arrives.
If it is bad news, immeasurable losses can be avoided. If it is good news, more accurate judgments
can be made to increase profits.
Assumptions for the Volatility Estimation Model:
Our model is a combination of the SDE, Levy Process and financial instruments, we have the
following assumptions:
1. Independent Increments: the increments of price in one-time intervals are independent of
the previous time intervals.
2. Absence of Arbitrage: There are no risk-free opportunities for profit within the market.
3. Returns Are Normally Distributed: The returns of an ETF prices are normally distributed.
4. Continuous Time: The prices change continuously. There is a non-zero interval between
the quotes.
5. Stationarity: The statistical properties of the process do not change over time.
6. Absence of Market Frictions: We assume frictionless markets with no transaction costs,
taxes, or other frictions.
7. Event Predictability: While aim to predict events using the Levy Process, it is important to
acknowledge the limitations of event prediction and the possibility of unforeseeable events.
Outline of the Model:
We want to predict future price trends (volatility) using this model, we would simulate the ETF
price path by discretizing the time interval and iteratively applying the equations above to compute
the price at each time step.
We will accurately estimate the parameters (drift, volatility, jump intensity, jump sizes) from
historical data and validating the model against real market data is essential to ensure the model's
predictive accuracy and relevance to the specific stock being analyzed.
By collecting historical parameters, observe which types of stocks will rise and which types of
stocks will fall after major events. And by using the Levy Process to predict important events that
may occur in the future, correct judgments can be made before the predicted major events arrive.
What types of stocks can be bought and what types of stocks can be sold.
We will collect ever financial data from a number of authoritative websites to use in our model,
such as Bloomberg and Yahoo Finance.
Write the basic mathematic model:
This model aims to estimate long-term volatility for risk management of the chosen ETF.
Consider the following SDE with extension of Levy process for the ETF’s price S(t) over a long-
term interval:
������(���) = ���(���(���))������ + ������(���)������(���) + ���(���)
Where:
• ���(���) is the S&P 500 ETF 's price at time t
• ���(���(���)) represents the average return μ
• ��� represents the constant volatility for the diffusion term
• ������(���) represents a Brownian motion for continuous diffusion
• ���(���) represents the Levy jump process, incorporating jumps in the S&P 500 ETF's price
Components Explanation:
1) Stochastic Differential Equation (SDE):
������(���) = ������(���)������ + ������(���)������(���)
2) Brownian Motion:
Simulate the Brownian motion part ������(���) in the model, capturing the ETF asset price’s
continuous, random fluctuations based on historical data. Use ETF’s historical price data to estimate
parameters like volatility σ and returns, then use these parameters in the model.
Daily Return =
���(���)−���(���−1)
���(���−1), determine time step dt (dt = 1 if use daily data), simulate the BM
increment (������(���) ) using formula ������(���) = ��� × √������ × ��� , where Z is a random sample from the
standard normal distribution.
3) Levy Jump Process:
when incorporating a Levy process into the stochastic differential equation (SDE) for financial
modeling, the Levy process represents jumps in the asset price that occur due to significant,
unexpected events. These events can include large news announcements, economic reports,
geopolitical developments, or any other market-moving news.
���(���) = ∑ ��� ���
���(���)
���=1
• ���(���) follows a Poisson distribution with intensity ��� (jump arrival rate)
• ��� ��� are independent and identically distributed random variables following a Variance
Gamma distribution
- Variables Estimation:
• News Impact Factor ������ : Use the pdf of VG distribution to model the factor. The VG
distribution is characterized by three parameters: θ (drift), σ2 (variance) and v (shape
parameter), the formula is
���(���|���, ���2
, ���) =
���
���√2������ (1
���) ( ���
���2)
���
2 (|��� − ���|
���2 )
���
−1
���
���
���2(���−���)������−1(√2���|��� − ���|
���2 )
Where:
• ���() represents the gamma function
• ������−1( ) is the modified Bessel function of the second kind
We assume that ��� ��� is drawn from the VG distribution with the 3 parameters, the parameters
can be estimated from historical data using techniques by using MLE (maximum likelihood
estimation). When incorporating ��� ��� into our model, we use the VG pdf to calculate the probability
of a specific news impact factor y.
• Frequency of Significant News Events: use a Poisson process to model the frequency of
significant news events within a specific time window. Let N(t) be the number of significant
news events that occur up to time t, assume that N(t) follows a Poisson process with
parameter λ, which represents the average number of news events per unit of time (a
week/month). The pmf of the distribution is:
���(���(���) = ���) =
(������)������−������
���!
Where:
• k is the number of events
• λ is the intensity parameter (average number of events per week or month)
Then calculate the expected number of events E[N(T)] using the Poisson process, given by
E[N(T)] = λT, the T here represents the length of the time window (in weeks or months). Then
use E[N(T)]) as a parameter in jump component calculation in the SDE representing the asset
price dynamics.
• Timing of News Events: we use time-dependent intensity function λ(t) in a Poisson process,
this intensity function captures the clustering of news events during market hours and can
be defined as follows:
���(���) = ���0 × ���(���)
Where:
• ���0 is the baseline intensity (average number of news events per unit of time)
• ���(���) is a function representing time-based patterns, such as higher news clustering
during market hours and lower activity during non-trading hours, use Gaussian
function: ���(���) = ���−
(���−���)2
2���2
, where ��� represents the mean time of market hours (e.g., 9:30
AM for the start of the US stock market) and ��� represents the standard deviation
controlling the width of the clustering around market hours
To incorporating the model into simulation, we use the Poisson distribution to calculate the
number of news event N(t) up to any given time t based on λ(t), then generate random event times
within the time window using Poisson process (when news events occur), then model its impact
��� ��� by using the model shows above.
• Sentiment Analysis: the Sentiment analysis of news articles or social media posts related
to the ETF can provide a quantitative measure of market sentiment. Positive or negative
sentiment can be used to adjust the jump sizes. (this part may need use NLP and machine
learning, ignore it firstly)
• Sentiment Analysis Alternatives: Since subjective analysis of whether a news story is good
or bad requires more advanced learning, we came up with a special method for analyzing
the impact of individual events on index funds. We first collect some data on a collection of
stocks into something like a market portfolio and observe them. We then do a simple
categorization of the types of events, which can be classified as ‘war events’, ‘interest rate
hikes’, ‘coups’, etc. Different events can be made to represent different percentage of
increases and decreases in the ETF portfolio. This way we can categorize different events
when we look at the ETF set in different ranges of increase and decrease. We can then
perform the levy process on the fund when the event occurs to further observe the impact of
various news on the direction of the index fund.
■ Example: The Palestinian-Israeli conflict has lowered the ETF portfolio price action
by 60%, so the next time the price is lowered at 59%-61%, we can categorize the
current situation as a war event and launch a levy process.
• Historical Price Volatility: Consider the historical volatility of the ETF price as a factor
influencing the jump sizes. Higher historical volatility might indicate a market that is more
reactive to news events. Firstly, calculate historical volatility using standard deviation of
daily returns, denote as σhist, then scale the jump sizes based on history volatility, a more
volatile market, news events might have a proportionally larger impact. Let Yi represent the
original impact of the ith news event, and let Yi
’
represent the scaled impact based on
historical volatility, the formula is
��� ���
′
= ��� ��� × (1 +
���ℎ���������
���������������
)
Apply Markov Chains to model regime shifts in volatility:
• Define Volatility Regime States:
Let V(t) represent the volatility at time t, let Vlow and Vhigh be the thresholds for low and high
volatility states, respectively. Then define a discrete state variable S(t) representing the volatility
regime at time t, S(t) = 1 for low volatility, S(t) = 2 for high volatility. Thus, if V(t) < Vlow, set S(t) =
1 and if V(t) ≥ Vhigh, set S(t) = 2.
• Transition Probability Matrix:
Consider there are 3 scenarios in market, Bull (B), Bear (R), and Sideways (S). The transition
probability matrix P is a 3x3 matrix representing the probabilities of transitioning between these
states. Define:
���(��� + 1 → ���), ���(��� + 1 → ���), ���(��� + 1 → ���), ���(��� + 1 → ���), ���(��� + 1 → ���), ���(��� + 1
→ ���), ���(��� + 1 → ���), ���(��� + 1 → ���)
and ���(��� + 1 → ���) , the probabilities in each row sum up to 1, representing the likelihood of
transitioning to one of the three states. Show in a matrix P:
��� = [���(��� + 1 → ���) ���(��� + 1 → ���) ���(��� + 1 → ���)
���(��� + 1 → ���) ���(��� + 1 → ���) ���(��� + 1 → ���)
���(��� + 1 → ���) ���(��� + 1 → ���) ���(��� + 1 → ���)]
Update State Variable using Markov Chain:
Use the formula for a discrete-time Markov Chain:
���(��� + 1) = ���(���) × ���
Where ���(���) represents the current state at time ���, and ���(��� + 1) represents the next state at time
��� + 1. Assuming the initial state probabilities vector ���(���) is [���(���, ���), ���(���, ���), ���(���, ���)], representing
the probabilities of being in Bull, Bear, and Sideways states at time t, the state transition formula
���(��� + 1) = ���(���) × ��� can be shown in a matrix:
[���(���, ��� + 1)
���(���, ��� + 1)
���(���, ��� + 1)] = [���(��� + 1 → ���) ���(��� + 1 → ���) ���(��� + 1 → ���)
���(��� + 1 → ���) ���(��� + 1 → ���) ���(��� + 1 → ���)
���(��� + 1 → ���) ���(��� + 1 → ���) ���(��� + 1 → ���)] ∙ [���(���, ���)
���(���, ���)
���(���, ���)]
Estimation and Calibration:
Utilize Bayesian methods, which provide a robust framework for estimating complex models.
Bayesian inference allows to incorporate prior beliefs about parameters, especially in situations with
limited data.
Construct a likelihood function that represents the probability of observing the historical data
given the model parameters. For financial data following Brownian motion, the likelihood function
often involves the normal distribution. The formula is:
���(������|���, ���) =
√2������2 ���������(−
(������− ���)2
2���2 )
Calibration: Consider long-term trends, macroeconomic factors, and other relevant information
influencing the ETF. Calibration involves finding parameter values that minimize the difference
between model predictions and historical observations.
Validation and Back-testing:
• Validate our model's accuracy by comparing predicted values with out-of-sample data. Use
statistical tests and goodness-of-fit measures to assess how well your model captures the
historical behavior of the ETF.
- Predict out-of-sample values, then use statistical tests (MSE, RMSE, MAE)
■ MSE =
��� ∑ (������
^
− ������)2
���
���=1
■ MAE =
��� ∑ |������
^
− ������|���
���=1
- Calculate the coefficient of determination (R2 and adjusted R2), which represents the
proportion of the variance in the dependent variable that is predictable from the
independent variable(s).
• Implement rigorous back-testing procedures to evaluate the model's performance over
different market conditions. Assess the model's ability to predict long-term trends accurately.
Final Result:
We use our diverse collection of ways (including levy processes) to further see how much
different types of events occurring at different frequencies can affect an ETF.
Our model can simply classify events and the percentage of impact on ETF trends, predict
future market trends based on events. The initial idea of our model is based on Brownian Motion,
then we add the levy process as a new term to the original model to count the jumps to see the
impact of good and bad news.
We want to use jump SDE and diffusion models (involving Levy processes) to achieve the
most accurate results possible and achieve more objective gains through predictions, and make leaps
by predicting events that may occur in the future to avoid uncontrollable losses or increase revenue.