辅导 program编程、讲解 c/c++，Java程序语言

High-dimensional Minimum Variance Portfolio Estimation
Based on High-frequency Data
June 1, 2019
Abstract
This paper studies the estimation of high-dimensional minimum variance portfolio
(MVP) based on the high frequency returns which can exhibit heteroscedasticity and
possibly be contaminated by microstructure noise. Under certain sparsity assumptions
on the precision matrix, we propose estimators of the MVP and prove that our portfolios asymptotically achieve the minimum variance in a sharp sense. In addition, we
introduce consistent estimators of the minimum variance, which provide reference targets. Simulation and empirical studies demonstrate the favorable performance of the
proposed portfolios.
Key Words: Minimum variance portfolio; High dimension; High frequency; CLIME
estimator; Precision matrix.
JEL Codes: C13, C55, C58, G11
1 Introduction
1.1 Background
Since the ground-breaking work of Markowitz (1952), the mean-variance portfolio has caught
significant attention from both academics and practitioners. To implement such a strategy in
practice, the accuracy in estimating both the expected returns and the covariance structure
of returns is vital. It has been well documented that the estimation of the expected returns
is more difficult than the estimation of covariances (Merton (1980)), and the impact on
portfolio performance caused by the estimation error in the expected returns is larger than
that caused by the error in covariance estimation. These difficulties pose serious challenges
for the practical implementation of the Markowitz portfolio optimization. Theoretically,
∗Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104.
tcai@wharton.upenn.edu.
†Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706. huj@stat.wisc.edu.
‡Department of ISOM and Department of Finance, Hong Kong University of Science and Technology,
Clear Water Bay, Kowloon, Hong Kong. yyli@ust.hk.
§Department of ISOM, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,
Hong Kong. xhzheng@ust.hk.
1
Electronic copy available at: https://ssrn.com/abstract=3105815
mean-variance optimization has been criticized from the portfolio standpoint due to invalid
preferences from the axioms of choice, inconsistent dynamic behavior, etc.
The minimum variance portfolio (MVP) has received growing attention over the past few
years (see, e.g., DeMiguel et al. (2009a) and the references therein). It avoids the difficulties
in estimating the expected returns and is on the efficient frontier. In addition, the MVP is
found to perform well on real data. Empirical studies in Haugen and Baker (1991), Chan
et al. (1999), Schwartz (2000), Jagannathan and Ma (2003) and Clarke et al. (2006) have
found that the MVP can enjoy both lower risk and higher return compared with some
benchmark portfolios. These features make the MVP an attractive investment strategy in
practice.
The MVP is more natural in the context of high-frequency data. Over short time horizons, mean return is usually completely dominated by the volatility, consequently, as a
prudent common practice, when the time horizon of interest is short, the expected returns
are often assumed to be zero (see, e.g., Part II of Christoffersen (2012) and the references
therein). Fan et al. (2012a) make this assumption when considering the management of
portfolios that are rebalanced daily or every other few days. When the expected returns are
zero, the mean-variance optimization reduces to the risk minimization problem, in which
one seeks the MVP.
In addition, there are benefits of using high-frequency data. On the one hand, large
number of observations can potentially help facilitate better understanding of the covariance
structure of returns. Developments in this direction in the high-dimensional setting include
Wang and Zou (2010); Tao et al. (2011); Zheng and Li (2011); Tao et al. (2013); Kim et al.
(2016); Aït-Sahalia and Xiu (2017); Xia and Zheng (2018); Dai et al. (2019); Pelger (2019),
among others. On the other hand, high-frequency data allow short-horizon rebalancing
and hence the portfolios can adjust quickly to time variability of volatilities/co-volatilities.
However, high-frequency data do come with significant challenges in analysis. Complications
arise due to heteroscedasticity and microstructure noise, among others.
We consider in this paper the estimation of high-dimensional MVP using high-frequency
data. To be more specific, given p assets, whose returns X = (X1, . . . , Xp)
| have covariance
matrix Σ, we aim to find:
arg min
w
w|Σw subject to w|1 = 1, (1.1)
where w = (w1, . . . , wp)
|
represents the weights put on different assets, and 1 = (1, . . . , 1)|
is the p-dimensional vector with all entries being 1. The optimal solution is given by
wopt =
Σ−11
1
|Σ−11
, (1.2)
which yields the minimum risk
Rmin = w
|
optΣwopt =
1
1
|Σ−11
. (1.3)
More generally, one may be interested in the following optimization problem: for a given
vector β = (β1, . . . , βp)
| and constant c,
arg min
w
we
|Σwe subject to w|1 = c, where we = (β1w1, . . . , βpwp), (1.4)
2
Electronic copy available at: https://ssrn.com/abstract=3105815
or its equivalent formulation:
arg min
we
we
|Σwe subject to we
|β
−1 = c, where β
−1
:= (1/β1, . . . , 1/βp)
|
. (1.5)
Such a setting applies, for example, in leveraged investment. We remark that the optimization problem (1.5) can be reduced to (1.1) by noticing that if we solves (1.5), then
wˇ := (we1/β1, . . . , wep/βp)
|/c solves (1.1) with Σe = diag(β1, . . . , βp)Σ diag(β1, . . . , βp), and
vice versa. For this reason, the two optimization problems (1.1) and (1.4) or (1.5) can be
transformed into each other. In the rest of the paper, we focus on the problem (1.1).
A main challenge of solving the optimization problem (1.1) comes from high-dimensionality
because modern portfolios often involve a large number of assets. See, for example, Zheng
and Li (2011), Fan et al. (2012a), Fan et al. (2012b), Ao et al. (2018), Xia and Zheng (2018),
and Dai et al. (2019) on issues about and progress made on vast portfolio management.
On the other hand, the estimation of the minimum risk defined in (1.3) is also a problem
of interest. This provides a reference target for the estimated minimum variance portfolios.
In practice, because the true covariance matrix is unknown, the sample covariance matrix S is usually used as a proxy, and the resulting “plug-in” portfolio, wp = S
−11/1
|S
−11,
has been widely adopted. How well does such a portfolio perform? This question has been
considered in Basak et al. (2009). The following simulation result visualizes their first finding (Proposition 1 therein). Figure 1 shows the risk of the plug-in portfolio based on 100
replications. One can see that the actual risk R(wp) = w
|
pΣwp of the plug-in portfolio
can be devastatingly higher than the theoretical minimum risk. On the other hand, the
perceived risk Rbp = w
|
pSwp can be even lower than the theoretical minimum risk. Such
contradictory phenomena lead to two questions: (1) Can we consistently estimate the true
minimum risk? ; and (2) More importantly, can we find a portfolio with a risk close to the
true minimum risk?
3
Electronic copy available at: https://ssrn.com/abstract=3105815
0 20 40 60 80 100
1.0e−05 1.5e−05 2.0e−05
Comparison of risks
replication
risk
perceived risk
actual risk
minimun risk
Figure 1. Comparison of actual and perceived risks of the plug-in portfolio. The portfolios
are constructed based on returns simulated from i.i.d. multivariate normal distribution with
mean zero and covariance matrix Σ calibrated from real data; see Section 4 for details.
The number of assets and observations are 80 and 252, respectively. The comparison is
replicated 100 times.
Because of such issues with the plug-in portfolio, alternative methods have been proposed. Jagannathan and Ma (2003) argue that imposing no short-sale constraint helps.
More generally, Fan et al. (2012b) study the MVP under the following gross-exposure constraint:
arg min
w
w|Σw subject to w|1 = 1 and ||w||1 ≤ λ , (1.6)
where ||w||1 =
Pp
i=1 |wi
| and λ is a chosen constant. They derive the following bound on
the risk of estimated portfolio. If Σb is an estimator of Σ, then the solution to (1.6) with Σ
replaced by Σb, denoted by wbopt, satisfies that
|R(wbopt) − Rmin| ≤ λ
2
· ||Σb − Σ||∞, (1.7)
where for any weight vector w, R(w) = w|Σw stands for the risk measured by the variance
of the portfolio return, and for any matrix A = (aij ), ||A||∞ := maxij |aij |. In particular,
||Σb −Σ||∞ is the maximum element-wise estimation error in using Σb to estimate Σ. Fan et al.
(2012a) consider the high-frequency setting, where they use the two-scale realized covariance
matrix (Zhang et al. (2005)) to estimate the integrated covariance matrix (see Section 2.1
below for related background), and establish concentration inequalities for the element-wise
estimation error. These concentration inequalities imply that even if the number of assets p
grows faster than the number of observations n, one still has that ||Σb −Σ||∞ → 0 as n → ∞;
see equation (18) in Fan et al. (2012a) for the precise statement. In particular, bound (1.7)
4
Electronic copy available at: https://ssrn.com/abstract=3105815
guarantees that under gross-exposure constraint, the difference between the risk associated
with wbopt and the minimum risk is asymptotically negligible.
The difference between the risk of an estimated portfolio and the minimum risk going to
zero, however, may not be sufficient to guarantee (near) optimality. In fact, under rather general assumptions (which do not exclude factor models), the minimum risk Rmin = 1/1
|Σ−11
may go to zero as the number of assets p → ∞; see Ding et al. (2018) for a thorough discussion. If indeed the minimum risk goes to 0 as p → ∞, then the difference |R(wbopt) − Rmin|
going to 0 is not enough to guarantee (near) optimality. Based on the above consideration,
we turn to find an asset allocation wb which satisfies a stronger sense of consistency in that
the ratio between the risk of the estimated portfolio and the minimum risk goes to one, i.e.,
R(wb)
Rmin
p
−→ 1 as p → ∞, (1.8)
where p
−→ stands for convergence in probability.
1.2 Main contributions of the paper
Our contributions mainly lie in the following aspects.
We propose estimators of minimum variance portfolio that can accommodate stochastic
volatility and market microstructure noise, which are intrinsic to high-frequency returns.
Under some sparsity assumptions on the inverse of the covariance matrix (also known as the
precision matrix), our estimated portfolios enjoy the desired convergence (1.8).
We also introduce consistent estimators of the minimum risk. One such estimator does
not depend on the sparsity assumption and also enjoys a CLT.
1.3 Organization of the paper
The paper is organized as follows. In Section 2, we present our estimators of the MVP
and show that their risks converge to the minimum risk in the sense of (1.8). We have
an estimator that incorporates stochastic volatility (CLIME-SV) and one that incorporates
stochastic volatility and microstructure noise (CLIME-SVMN). A consistent estimator of the
minimum risk is proposed in Section 2.4, for which we also establish the CLT. An extension of
our method to utilize factors is developed in Section 3. Section 4 presents simulation results
to illustrate the performance of both portfolio and minimum risk estimations. Empirical
study results based on S&P 100 Index constituents are reported in Section 5. We conclude
our paper with a brief summary in Section 6. All proofs are given in the Appendix.
2 Estimation Methods and Asymptotic Properties
2.1 High-frequency data model
We assume that the latent p-dimensional log-price process (Xt) follows a diffusion model:
dXt = µtdt + Θt dWt
, for t ≥ 0, (2.1)
5
Electronic copy available at: https://ssrn.com/abstract=3105815
where (µt) = (µ
1
t
, . . . , µ
p
t
)
|
is the drift process, (Θt) = (θ
ij
t
)1≤i,j≤p is a p × p matrix-valued
process called the spot co-volatility process, and (Wt) is a p-dimensional Brownian motion.
Both (µt) and (Θt) are stochastic, càdlàg, and may depend on (Wt), all defined on a
common filtered probability space (Ω, F,(Ft)t≥0).
Let
Σt = ΘtΘ
|
t
:= (σ
ij
t
)
be the spot covariance matrix process. The ex-post integrated covariance (ICV) matrix over
an interval, say [0, 1], is
ΣICV = ΣICV,1 = (σ
ij ) := Z 1
0
Σt dt.
Denote its inverse by ΩICV := Σ
−1
ICV. The ex-post minimum risk, Rmin, is obtained by
replacing the Σ in (1.3) with ΣICV.
Let us emphasize that in general, ΣICV is a random variable which is only measurable
to F1, and so is Rmin. It is therefore in principle impossible to construct a portfolio that
is measurable to F0 to achieve the minimum risk Rmin. Practical implementation of the
minimum variance portfolio relies on making forecasts of ΣICV based on historical data.
The simplest approach is to assume that ΣICV,t ≈ ΣICV,t+1
1
, where ΣICV,t stands for the
ICV matrix in period [t−1, t]. Under such an assumption, if we can construct a portfolio w
based on the observations during [t − 1, t] (and hence only measurable to Ft) that can
approximately minimize the ex-post risk w|ΣICV,tw, then if we hold the portfolio during
the next period [t, t + 1], the actual risk w|ΣICV,t+1w is still approximately minimized. In
this article we adopt such a strategy.
2.2 High-frequency case with no microstructure noise
We first consider the case when there is no microstructure noise - in other words, one observes
the true log-prices (Xi
t
).
Our approach to estimate the minimum variance portfolio relies on the constrained l1-
minimization for inverse matrix estimation (CLIME) proposed in Cai et al. (2011). The
original CLIME method is developed under the i.i.d. observation setting. Specifically, suppose one has n i.i.d. observations from a population with covariance matrix Σ. Let Ω := Σ−1
be the corresponding precision matrix. The CLIME estimator of Ω is defined as
Ωb CLIME := arg min
Ω0
||Ω0
||1 subject to ||ΣΩb 0 − I||∞ ≤ λ, (2.2)
where Σb is the sample covariance matrix, I is the identity matrix, and for any matrix
A = (aij ), ||A||1 := P
i,j |aij | and, recall that, ||A||∞ = maxij |aij |. The λ is a tuning
parameter, and is usually chosen via cross-validation.
1This is related to the phenomenon that the volatility process is often found to be nearly unit root, in
which case the one-step ahead prediction is approximately the current value. The assumption is also used
in Fan et al. (2012a) and Dai et al. (2019), among others.
6
Electronic copy available at: https://ssrn.com/abstract=3105815
The CLIME method is designed for the following uniformity class of precision matrices.
For any 0 ≤ q < 1, s0 = s0(p) < ∞ and M = M(p) < ∞, let
U(q, s0, M) = n
Ω = (Ωij )p×p : Ω positive definite, ||Ω||L1 ≤ M, max
1≤i≤p
X
p
j=1
|Ωij |
q ≤ s0
o
,
(2.3)
where ||Ω||L1
:= max1≤j≤p
Pp
i=1 |Ωij |.
Under the situation when the observations are i.i.d. sub-Gaussian and the underlying Ω
belongs to U(q, s0(p), M(p)), Cai et al. (2011) establish consistency of Ωb CLIME when q, s0(p)
and M(p) satisfy M2−2q
s0 (log p/n)
(1−q)/2 → 0; see Theorem 1(a) therein.
In the high-frequency setting, due to stochastic volatility, leverage effect etc., the returns
are not i.i.d., so the results in Cai et al. (2011) do not apply. Before we discuss how
to tackle this difficulty, we remark that the sparsity assumption Ω = Σ−1 ∈ U(q, s0, M)
appears to be reasonable in financial applications. For example, if the returns are assumed
to follow a (conditional) multivariate normal distribution with covariance matrix Σ, then
the (i, j)th element in Ω being 0 is equivalent to that the returns of the ith and jth assets
are conditionally independent given the other asset returns. For stocks in different sectors,
many pairs might be conditionally independent or only weakly dependent.
Now we discuss how to adopt CLIME to the high-frequency setting. Our goal is to
estimate ΣICV, or, more precisely, its inverse ΩICV := Σ
−1
ICV. In order to apply CLIME, we
need to decide which Σb to use in (2.2).
When the true log-prices are observed, one of the most commonly used estimators
for ΣICV is the realized covariance (RCV) matrix. Specifically, for each asset i, suppose
the observations at stage n are
Xi
t
i,n
`

, where 0 = t
i,n
0 < ti,n
1 < · · · < ti,n
Ni
= 1 are the
observation times. The n characterizes the observation frequency, and Ni → ∞ as n → ∞.
The synchronous observation case corresponds to
t
i,n
` ≡ t
n
`
for all i = 1, . . . , p, (2.4)
which, in the simplest equidistant setting, reduces to
t
i,n
` = t
n
` = `/n, ` = 0, 1, . . . , n. (2.5)
In the synchronous observation case (2.4), for ` = 1, . . . , n, let
∆X`
:= Xt
n
`
− Xt
n
`−1
be the log-return vector over the time interval [t
n
`−1
, tn
`
]. The RCV matrix is defined as
Σb RCV =
Xn
`=1
∆X`(∆X`)
|
. (2.6)
We are now ready to define the constrained l1-minimization for inverse matrix estimation
with stochastic volatility (CLIME-SV), Ωb CLIME−SV:
Ωb CLIME−SV := arg min
Ω0
||Ω0
||1 subject to ||Σb RCVΩ0 − I||∞ ≤ λ. (2.7)
7
Electronic copy available at: https://ssrn.com/abstract=310581
The resulting MVP estimator is given by
wbCLIME−SV =
Ωb CLIME−SV1
1
|Ωb CLIME−SV1
, (2.8)
which is associated with a risk of
RCLIME−SV = wb
|
CLIME−SVΣICVwbCLIME−SV =
(Ωb CLIME−SV1)
|ΣICV(Ωb CLIME−SV1)
(1
|Ωb CLIME−SV1)
2
. (2.9)
Next we discuss the theoretical properties of this portfolio estimator. We make the
following assumptions.
Assumption A:
(A.i) There exists δ ∈ (0, 1) such that for all p, δ ≤ λmin(ΣICV) ≤ λmax(ΣICV) < 1/δ
almost surely, where for any symmetric matrix A, λmin(A) and λmax(A) stand
for its smallest and largest eigenvalues, respectively.
(A.ii) The drift process is such that |µ
i
t
| ≤ Cµ for some constant Cµ < ∞ for all
i = 1, . . . , p and t ∈ [0, 1] almost surely.
(A.iii) There exists constant Cσ such that |σ
ij
t
| ≤ Cσ < ∞ for all i = 1, . . . , p and t ∈
[0, 1] almost surely.
(A.iv) The observation times t
n
`
under the synchronous setting (2.4) satisfy that there
exists constant C∆ such that
sup
n
max
1≤`≤n
n|t
n
` − t
n
`−1
| ≤ C∆ < ∞. (2.10)
(A.v) p and n satisfy that log p/n → 0.
Remark 1. Assumptions (A.ii) and (A.iii) are standard in the high frequency literature, one
assuming bounded drift, the other assuming bounded volatility. One may relax the bounded
volatility assumption by using techniques similar to Lemma 3 of Fan et al. (2012a). Assumption (A.iv) is also widely adopted, although one may need to apply synchronization
techniques such as the previous tick (Zhang (2011)) and refresh time (Barndorff-Nielsen
et al. (2011)) in order to synchronize the observations. Assumption (A.v) implies that the
number of assets, p, can grow faster than the observation frequency.
Theorem 1. Suppose that (Xt) satisfies (2.1) and the underlying precision matrix ΩICV =
Σ
−1
ICV ∈ U(q, s0, M). Under Assumptions (A.i)-(A.v), with λ = ηMp
log p/n for some
η > 0, there exist constants C1, C2 > 0 such that
P

RCLIME−SV
Rmin
− 1 = O

M2−2q
s0

(log p/n)
(1−q)/2

≥ 1 −
C1
p
C2η
2−2
,
where RCLIME−SV is defined in (2.9) and Rmin = 1/(1
|ΩICV1).
Remark 2. Theorem 1 guarantees that as long as M2−2q
s0

(log p/n)
(1−q)/2

→ 0, the
risk of our estimated MVP is consistent in the sense of (1.8). The estimated portfolio is
therefore applicable to the ultra-high-dimensional setting where the number of assets can be
much larger than the number of observations.
8
Electronic copy available at: https://ssrn.com/abstract=310581
Remark 3. The case where observations are i.i.d. is a special case of the high frequency
setting that we adopt here. In fact, to generate i.i.d. returns under our setting, one just
needs to take constant drift and volatility processes. Our setting is therefore more general
than the i.i.d. observation setting, and all our results readily apply to that case.
2.3 High-frequency case with microstructure noise
In general, the observed prices are believed to be contaminated by microstructure noise. In
other words, instead of observing the true log-prices
Xi
t
i,n
`

, for each asset i, the observations
at stage n are
Y
i
t
i,n
`
= Xi
t
i,n
`
+ ε
i
`
, (2.11)
where ε
i
`
’s represent microstructure noise. In this case, if one simply plugs
Y
i
t
i,n
`

into the
formula of RCV in (2.6), the resulting estimator is not consistent even when the dimension p
is fixed. Consistent estimators in the univariate case include the two-scales realized volatility
(TSRV, Zhang et al. (2005)), multi-scale realized volatility (MSRV, Zhang (2006)), preaveraging estimator (PAV, Jacod et al. (2009), Podolskij and Vetter (2009) and Jacod et al.
(2019)), realized kernels (RK, Barndorff-Nielsen et al. (2008)), quasi-maximum likelihood
estimator (QMLE, Xiu (2010)), estimated-price realized volatility (ERV, Li et al. (2016))
and the unified volatility estimator (UV, Li et al. (2018)). All these estimators, however,
are not consistent in the high-dimensional setting.
In this article we choose to work with the PAV estimator. To reduce non-essential technical complications, we work with the equidistant time setting (2.5) (such as data sampled
every minute). Asynchronicity can be dealt with by using existing data synchronization
techniques. Compared with microstructure noise, asynchronicity is less an issue; see, for
example, Section 2.4 in Xia and Zheng (2018).
To implement the PAV estimator, we fix a constant θ > 0 and let kn = [θn1/2
] be the
window length over which the averaging takes place. Define
Y
n
k =
Pkn−1
i=kn/2 Ytk+i −
Pkn/2−1
i=0 Ytk+i
kn
.
The PAV with weight function g(x) = x ∧ (1 − x) for x ∈ (0, 1) is defined as
Σb PAV =
12
θ
√
n
n−X
kn+1
k=0
Y
n
k
· (Y
n
k
)
| −
6
θ
2n
diag Xn
k=1
(∆Y
i
tk
)
2
!
i=1,...,p
. (2.12)
We now define the constrained l1-minimization for inverse matrix estimation with stochastic volatility and microstructure noise (CLIME-SVMN), Ωb CLIME−SVMN, as
Ωb CLIME−SVMN := arg min
Ω0
||Ω0
||1 subject to ||Σb PAVΩ0 − I||∞ ≤ λ. (2.13)
Correspondingly, we define the estimated MVP, wbCLIME−SVMN, by replacing Ωb CLIME−SV
in (2.8) with Ωb CLIME−SVMN. Its associated risk, RCLIME−SVMN, is given by replacing
wbCLIME−SV and Ωb CLIME−SV in (2.9) with wbCLIME−SVMN and Ωb CLIME−SVMN, respectively.
9
Electronic copy available at: https://ssrn.com/abstract=31058
Now we state the assumptions that we need in order to establish the statistical properties
of the portfolio wbCLIME−SVMN.
Assumption B:
(B.i) The microstructure noise (ε
i
`
) is strictly stationary and independent with mean
zero and also independent of (Xt).
(B.ii) For any γ ∈ R,
E(exp(γεi
`
)) ≤ exp(Cγ2
),
where C is a fixed constant. Suppose also that there exists Cε > 0 such that
Var(ε
i
`
) ≤ Cε for all i and `.
(B.iii) p and n satisfy that log p/√
n → 0.
Theorem 2. Suppose that (Xt) satisfies (2.1) and ΩICV = Σ
−1
ICV ∈ U(q, s0, M). Under
Assumptions (A.i)-(A.iv) and (B.i)-(B.iii), with λ = ηM√
log p/n1/4
for some η > 0, there
exist constants C3, C4 > 0 such that
P

RCLIME−SVMN
Rmin
− 1 = O

M2−2q
s0

(log p)
(1−q)/2
/n(1−q)/4

≥ 1 −
C3
p
C4η
2−2
.
Remark 4. Theorem 2 states that in the noisy case, if M2−2q
s0

(log p)
(1−q)/2/n(1−q)/4

→ 0,
then the risk of our estimated MVP is consistent in the sense of (1.8).
Remark 5. The reduction in the rate from (
√
log p/√
n)
1−q
in Theorem 1 to (
√
log p/n1/4
)
1−q
in the current theorem is an inevitable consequence due to noise. In fact, the optimal rate in
estimating the integrated volatility in the noisy case is O(n
1/4
) (Gloter and Jacod (2001)),
as is compared with the rate of O(n
1/2
) in the noiseless case.
2.4 Estimating the minimum risk
So far we have seen that under certain sparsity assumptions on the precision matrix, we can
construct a portfolio with a risk close to the minimum risk in the sense of (1.8). Now we
turn to consistent estimation of the minimum risk.
2.4.1 Assuming sparsity: using CLIME based estimator
The CLIME based estimators we considered in Sections 2.2 and 2.3 also lead to estimators of
the minimum risk Rmin. Recall that Rmin = 1/1
|ΩICV1. Our estimators are, when working
under high-frequency setting without noise,
RbCLIME−SV =
1
1
|Ωb CLIME−SV1
, (2.14)
where, recall that, Ωb CLIME−SV is defined in (2.7). Similarly, if microstructure noise is
present, then the minimum risk estimator is defined as
RbCLIME−SVMN =
1
1
|Ωb CLIME−SVMN1
, (2.15)
10
Electronic copy available at: https://ssrn.com/abstract=310581
where Ωb CLIME−SVMN is defined in (2.13).
We have the following results for these estimators.
Theorem 3. (i) Under the assumptions in Theorem 1, the minimum risk estimator RbCLIME−SV
defined in (2.14) satisfies that
P

RbCLIME−SV
Rmin
− 1 = O

M2−2q
s0

(log p/n)
(1−q)/2
!
≥ 1 −
C1
p
C2η
2−2
.
(ii) Under the assumptions in Theorem 2, the minimum risk estimator RbCLIME−SVMN
defined in (2.15) satisfies that
P

RbCLIME−SVMN
Rmin
− 1 = O

M2−2q
s0

(log p)
(1−q)/2
/n(1−q)/4
!
≥ 1 −
C3
p
C4η
2−2
.
2.4.2 Without sparsity assumption: low-frequency i.i.d. returns
CLIME based estimators work under mild sparsity assumptions on the precision matrix.
Now we propose another estimator of the minimum risk, which does not rely on such sparsity
assumptions, although on the other hand, it assumes that the observations are i.i.d. normally
distributed. This estimator is hence more suitable for the low frequency setting and can be
used to estimate the minimum risk over a long time period.
More specifically, suppose that we observe n i.i.d. returns X1, . . . , Xn (possibly at low
frequency). Let S be the sample covariance matrix, and let wp = S
−11/1
|S
−11 be the “plugin” portfolio. The corresponding perceived risk is Rbp = w
|
pSwp. We have the following result
on the relationship between Rbp and the minimum risk Rmin, based on which a consistent
estimator of the minimum risk is constructed.
Proposition 1. Suppose that the returns X1, . . . , Xn ∼i.i.d. N(µ, Σ). Suppose further that
both n and p → ∞ in such a way that ρn := p/n → ρ ∈ (0, 1). Then

Rbp
Rmin
− (1 − ρn)

p
−→ 0. (2.16)
Therefore, if we define
Rbmin =
1
1 − ρn
Rbp, (2.17)
then
Rbmin
Rmin
p
−→ 1. (2.18)
Furthermore, we have
√
n − p

Rbmin
Rmin
− 1
!
⇒ N(0, 2). (2.19)
11
Electronic copy available at: https://ssrn.com/abstract=3105815
The convergence (2.19) actually shows “blessing” of dimensionality: the higher the dimension, the more accurate the estimation.
The convergence (2.16) explains why in Figure 1 the perceived risk is systematically
lower than the minimum risk. We also remark that the “plug-in” portfolio is not optimal.
In Basak et al. (2009), it is shown that the risk of the plug-in portfolio is on average a
higher-than-one multiple of the minimum risk; see Propositions 1 and 2 therein. Under the
condition that both p and n → ∞ and p/n → ρ ∈ (0, 1), their result can be strengthened to
be that the risk of the plug-in portfolio is, with probability approaching one, a larger-thanone multiple of the minimum risk. In fact, using the relationship (5) in Basak et al. (2009),
it is easy to show that
R(wp)
Rmin
p
−→
1
1 − ρ
, (2.20)
where R(wp) = w
|
pΣwp is the risk of the plug-in portfolio.
3 Estimation with Factors
In practice, stock returns often exhibit a factor structure. Prominent examples include
CAPM (Sharpe (1964)), Fama-French Three Factor and Five Factor models (Fama and
French (1992, 2015)). Recently, there have also been studies on factor modeling for highfrequency returns; see Aït-Sahalia and Xiu (2017); Dai et al. (2019); Pelger (2019). In this
section, we extend our approach to accommodate such a structure.
Let rk = (rk,1, . . . , rk,p)
|
, k = 1, . . . , n, be asset returns, which can be either highfrequency such as 5-minute returns or low-frequency like daily or weekly returns. We assume
that (rk) admits a factor structure as follows:
rk = α + Γfk + zk, (3.1)
where α is a p × 1 unknown vector, Γ is a p × m unknown coefficient matrix, fk =
(fk,1, . . . , fk,m)
| are factor returns, and zk is p×1 random vector with mean 0 and covariance
matrix Σk,0 = (σ
k,0
ij ). We assume that for each k, fk’s and zk’s are independent, and the pairs
(rk,fk), k = 1, . . . , n, are mutually independent. Let Σr,k = Cov(rk) and Σf,k = Cov(fk).
Note that we allow both Σr,k and Σf,k to depend on the time index k, to accommodate
stochastic (co-)volatility. Because of such a feature, it is impossible to estimate individual Σr,k or Σf,k, but it is possible to estimate their means, namely, Σr =
1
n
Pn
k=1 Σr,k,
Σf =
1
n
Pn
k=1 Σf,k, and Σ0 =
1
n
Pn
k=1 Σk,0 and the corresponding Ω0 = Σ
−1
0
. The estimation procedure requires consistently estimating the means of (rk,fk), which entails the
condition that E(fk)’s are the same for all k.
To estimate the model, we first calculate the least squares estimators of α and Γ, denoted
by αˆ and Γb respectively. The residuals are ek := rk − αˆ − Γbfk. We then obtain a CLIME
estimator of Ω0, denoted by Ωb0, based on the residuals. Specifically,
Ωb0 := arg min
Ω0
||Ω0
||1 subject to ||SeΩ0 − I||∞ ≤ λ, (3.2)
where Se is the sample covariance matrix of residuals (e1, . . . , en).
12
Electronic copy available at: https://ssrn.com/abstract=3105815
Our goal is to estimate the precision matrix Ωr := Σ−1
r
. To do so, note that Σr =
ΓΣfΓ
| + Σ0, it follows that
Ωr = (ΓΣfΓ
| + Σ0)
−1 = Ω0 − Ω0Γ(Σ
−1
f + Γ|Ω0Γ)−1Γ
|Ω0. (3.3)
Therefore, we can define the constrained l1-minimization for inverse matrix estimation adjusted for factor (CLIME-F), Ωb CLIME−F, as
Ωb CLIME−F = Ωb0 − Ωb0Γ( b S
−1
f + Γb|Ωb0Γ) b −1Γb|Ωb0, (3.4)
where Sf
is the sample covariance matrix of (f1, . . . ,fn). From here the MVP estimator is
wbCLIME−F =
Ωb CLIME−F1
1
|Ωb CLIME−F1
. (3.5)
To establish the statistical properties for this estimator, the following assumptions are
made.
Assumption C:
(C.i) Let the number of factors, m, be fixed, and log(p) = o(n). Moreover, there exist
η > 0 and K > 0 such that for all i = 1, . . . , m, j = 1, . . . , p, and k = 1, . . . , n,
E(exp(ηf 2
ki)) ≤ K, E(exp(ηz2
kj/σk,0
jj )) ≤ K, an