Real estate investing is an interesting game to play. There are many ways to be a real estate investor, you may be interested solely in Cash Flows, or you may be looking for flips, or you might just be playing a game of appreciation. So, I want to talk a little bit about how to think about investment decisions in general. The way that we’re going to do that is through the use of a portfolio of real estate. We are going to pick a strategy, which is appreciation. It isn’t the only way to go, but appreciation makes a lot of sense given that you may want to take this model and apply it elsewhere to other asset classes. The goal is to find a portfolio of real estate that will offer appreciation, while maximizing our risk adjusted return. I’ll then add an additional condition, namely, that as working schmucks that have a 9 to 5 job, we want our investments working even if we aren’t. So we want to make sure that our portfolio has a negative correlation with the unemployment rate. The idea is that we want to cushion ourselves in the event that we lose our jobs. We will construct our portfolio by deciding what percentage of our bankroll should we invest in each zip code.

To do this, we pull down data on every zip code in the United States. Now, I know, that there is a lot of heterogeneity within a zip code in terms of its real estate. Certainly, town houses and condos will be priced differently than stand alone single-family residences. Also, pricing will depend on the size of property. A larger property will be more valuable, ceteris paribus. So there is some level of abstraction going on using the average price for a zip code. Nevertheless, we soldier on. In the end, what we care about is what is the rate of appreciation within the zip code, so the average should be just fine.
To do this we will construct a macro economic multi-factor model. We proceed by using the betas from the multifactor model and optimize over them in our portfolio.
We go zip code by zip code, and estimate a regression model of appreciation rates against the macroeconomic factors. Given that I am optimizing over 811 zip codes. There are more in the database that I have, except that I am missing data for some time periods due to the addition and removal of zip codes. So I only base this analysis on zip codes that I have at least 6 observations. Note, that I only have quarterly data for economic variables, so I only have the 811 as my universe of zip codes to optimize my portfolio over.
I’m only using a conventional OLS algorithm. You may want to utilize something more suited for time series like an ARIMA model, or even a VAR or VECM type model. I am regressing on the percent change in GDP, PCE (inflation), Unemployment Rate, and the Federal Funds Rate.
df = pd.read_sql_query(sql, engine)
for region in df['region_id'].unique():
try:
temp = df[df['region_id'] == region]
temp = temp.dropna()
if len(temp)<5:
continue
try:
model = sm.OLS(endog=temp['cap_rate'], exog=sm.add_constant(temp[['gdp','pce','urate','ffr']]))
except ValueError:
continue
result = model.fit()
I then stored these betas into a database, which I then use to calculate the optimized portfolio. So we are trying to make allocative decisions on investments in a certain area. I then built an optimization routine on top of these betas to select a weighting for my investment. What we are going to try to accomplish is to maximize the risk adjusted returns, i.e. the Sharpe Ratio.
def objective(weights, returns, cov_matrix):
return -(returns @ weights)/np.sqrt(weights.T @ cov_matrix @ weights)
def find_optimal2():
df = pd.read_sql_query(sql2, engine).dropna()
df['appreciation'] = np.log(df['home_value']).diff().dropna()
cov_matrix = df.pivot_table(index=['date'], columns=['region_id'], values=['appreciation']).T.dropna().T.cov()
returns = df.pivot_table(index=['date'], columns=['region_id'], values=['appreciation']).T.dropna().T.mean(axis=0)
init_weights = np.ones(len(returns)) / len(returns)
beta = pd.read_sql_query(sql, engine)
# regions = df.pivot_table(index=['date'], columns=['region_id'], values=['cap_rate']).T.dropna().T.columns.values
# print(returns.index.get_level_values('region_id'))
unemployment_betas = beta[beta['region_id'].isin(returns.index.get_level_values('region_id'))]['unemployment']
constraints = [
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}, # fully invested
{'type': 'ineq', 'fun': lambda w: -w.T @ unemployment_betas} # negative exposure to unemployment
]
# Bounds: No short selling (if desired)
bounds = [(0, 1) for _ in range(len(returns))]
result = minimize(
objective,
init_weights,
args=(returns, cov_matrix,),
method='SLSQP',
bounds=bounds,
constraints=constraints,
options={'maxiter': 100000}
)
print(result)
# print(np.argmax(result.x))
argument = np.argmax(result.x)
regionid = returns.index.get_level_values('region_id')[argument]
print(f'Appreciation: Allocate {result.x[argument] * 100:.2f}% to Region {regionid}')
for x,y in zip(returns.index.get_level_values('region_id'),result.x):
if y>0.01:
print(f"{x},{y}")
return 4*(returns @ result.x)
It turns out the unemployment constraint is non-binding. Go figure. I did this by running it once with and without the constraint. So it looks like real estate is negatively correlated to the unemployment rate already. So that’s neat enough. Since it is non-binding I will just report where we should invest and in what quantity to maximize our Sharpe Ratio. It turns out that we end up with a Sharpe Ratio of 2.25 and an annual rate of return of 5.76%. Not bad considering that we are completely ignoring rental income from this space as well. So without further ado, here’s how to spread out your real estate money to get the best Sharpe Ratio:
Jacksonville, NC | 27.75% |
New York City, NY | 20.79% |
Philadelphia, PA | 11.08% |
Chicago, IL | 10.79% |
West New York, NJ | 8.44% |
Abilene, TX | 8.03% |
Bayonne, NJ | 4.68% |
Manhattan, KS | 3.22% |
Long Beach, CA | 2.73% |
Beverly Hills, CA | 2.48% |
Let’s say that we want a positive correlation with the unemployment rate then. This will give us returns that work when unemployment rates spike, like say in a recession. Let’s see how our distribution changes. We do this by simply removing the minus sign from the constraint on the unemployment rate. In this case, my returns are better, slightly, I get an annual return of 5.31%, but on a risk adjusted basis, they are slightly worse at a Sharpe Ratio of 1.93. Now, my portfolio is positively correlated with the unemployment rate.
Abilene, TX | 26.82% |
New York, NY | 19.42% |
Jacksonville, NC | 16.64% |
Philadelphi, PA | 8.13% |
Chicago, IL | 7.87% |
Long Beach, CA | 6.47% |
Philadelphi, PA | 6.11% |
West New York, NJ | 2.69% |
New York, NY | 1.38% |
Union City, NJ | 1.33% |
Baltimore, MD | 1.30% |
Note that New York, NY is on our list twice here because there are multiple zip codes that made our list. Also of note that many of the same names are showing up here, Abilene, Jacksonville, Chicago, Philadelphia, Long Beach. These just seem to be good places to get some decent appreciation. But notice that we now have a different allocation which will respond positively to increases in the unemployment rate. Neat right?