Primer on Python for Finance

Daniel P. Palomar (2025). Portfolio Optimization: Theory and Application. Cambridge University Press.

Last update: February 06, 2026

Contributors:


Introduction

What is Python?

Python vs R

Let's not even get started :) Both are great! Learn both! You'll have to use whatever your boss/advisor/team needs. But don't forget to know a bit of C++ too :)

Installation

First, install the Python distribution, for example, from Anaconda (in macOS and Linux you can also use brew install python).

Then, install your favorite code editor or IDE. Some examples are:

To get started coding, start your code editor or IDE. For example, with JupyterLab either click the app or from a terminal (or cmd on Windows) simply type jupyter lab (and a browser window will pop up). A Jupyter notebook is an environment where you can write code and interactively evaluate its output. This feature is very convenient for exploratory analysis. Now you are ready to start using Python from within JupyterLab.

Libraries

To see the versions of Python and the installed libraries just type !pip list or !conda list on a Jupyter notebook and press SHIFT+ENTER. Alternatively, type pip list or conda list in a terminal window and press ENTER.

To see the version of a specific library use import library_name; print(library_name.__version__).

As time progresses, you will have to install different libraries from PYPI or Conda with the command pip install library_name or conda install library_name. Note that you can execute these commands from within a Jupyter notebook by prepending them with an exclamation mark (!).

After installing a library, it needs to be imported before it can be used with the command import library_name:

# we need to import it first and then we can use it:
import numpy  # to install do: pip install numpy
x = [1, 2, 3]
y = numpy.mean(x)
y
np.float64(2.0)

It is common to use shortcuts for the names of the imported libraries:

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 5, 100)
y = (x - np.pi) * (x - 1.618)
plt.plot(x, y, label = "2nd order degree polynomial")
plt.legend()
plt.show()

Good style

The are several Python style guides:

  • Style Guide for Python Code: The Python Enhancement Proposal (PEP) 8 is a widely accepted document that outlines good programming practices for Python. It was created by Guido van Rossum, Barry Warsaw, and Alyssa Coghlan, and it evolves over time as new conventions are identified and old ones become obsolete. The guide emphasizes that code is read much more often than it is written, and therefore, readability and consistency are crucial. It provides guidelines on various aspects of coding in Python, including naming conventions, indentation, and use of whitespace, among others.

  • Google style guide: Google also has its own style guide for Python, which includes a list of dos and don'ts for Python programs. It emphasizes the importance of using descriptive names for public APIs, making modules importable, and using the right style for module, function, method docstrings, and inline comments. It also recommends using tools like pylint for finding bugs and style problems in Python source code.

  • The Hitchhiker's Guide to Python: It is another resource that also recommends following PEP 8. It highlights the importance of readability and provides some common Python idioms[6].

Remember, while these style guides provide useful guidelines, they are not absolute rules. They are intended to improve the readability and consistency of your code, but there may be instances where it makes sense to deviate from the guidelines. As PEP 8 itself says, "A Foolish Consistency is the Hobgoblin of Little Minds". When in doubt, use your best judgment and consider the readability and maintainability of your code.

Variables: lists, dicts, arrays, and data frames

In Python, we can easily assign a value to a variable or object with = (if the variable does not exist it will be created):

x = "Hello"
x
'Hello'

We can combine several elements with lists:

y = ["Hello", "everyone"]
y
['Hello', 'everyone']

Note that elements in a list need not have the same datatype (we'll see the datatypes in a few minutes)

y = [1, "hello", 2., "everyone"]
y
[1, 'hello', 2.0, 'everyone']

A dictionary, or simply "dict", is a data structure that allows mappings between keywords and values. There are many ways to create a dict, the simplest one is just to use curly brackets {} as follows:

x = {'a': 1, 'b': 2, 'c': 3}
x
{'a': 1, 'b': 2, 'c': 3}

Another way is to explicitly specify the keywords and values:

x = dict(a = 1, b = 2, c = 3)
x
{'a': 1, 'b': 2, 'c': 3}

A dict can store different data types for different keywords:

x = dict(a = '1', b = 2, c = 3)
x
{'a': '1', 'b': 2, 'c': 3}

The usual way to query a value from a dict is to pass the desired keyword:

x['a']
'1'

We can also modify the contents of dicts or add new entries:

x = dict(a = 1, b = 2, c = 3)
x['a'] = 2
x
{'a': 2, 'b': 2, 'c': 3}
x['d'] = 10
x
{'a': 2, 'b': 2, 'c': 3, 'd': 10}

Sets in Python are a collection of unordered unique elements. The main purposes of sets are to verify membership, remove duplicate elements from a sequence, and computing standard math operations on sets.

We can create sets from lists as follows:

x = set([1, 2, 3, 1])
y = set([2, 4, 5, 3])
x.intersection(y)
{2, 3}
x.difference(y)
{1}
x.symmetric_difference(y)
{1, 4, 5}

A useful command is ?variable. It gives you various information about the variable, i.e., type, dimensions, contents, etc. Note that this is a feature only in IPython and JupyterLab.

Another useful feature in Python is slicing. It is specially good for arrays or lists of large dimensions, showing you the first and last n elements, respectively.

x = np.arange(1000)
x[:10]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[-10:]
array([990, 991, 992, 993, 994, 995, 996, 997, 998, 999])

It is important to keep in mind that in Python almost everything is done through functions or methods of all sorts such as max(), min(), arange(), linspace(), and so on.

Data types

Operators in Python: arithmetic operators include +, -, *, /, ** for addition, subtraction, multiplication, division, and exponentiation. Binary comparison operators are >, >=, ==, !=. Boolean operators are and, or, and their bitwise versions are &, |.

Python has a wide variety of data types including scalars (float, double, integers, complex numbers), strings, lists, tuples, dictionaries (dicts), sets, nd-arrays, and data frames.

Scalars and strings

Scalars are basically float and integers, for example:

x = 1
type(x)
int
x = 1.1
type(x)
float

Can you think about why Python gives the following answer to the sum 3.3 - (1.1 + 2.2)?

x = 3.3 - (1.1 + 2.2)
x
-4.440892098500626e-16

Try the same thing in R and MATLAB. what do you see? Is it different from what Python computed? What is the binary expansion of 0.333333...? How do computers represent anything? :)

x = "Hello ELEC 3180"
x
'Hello ELEC 3180'

Modern String Formatting with F-Strings

F-strings (formatted string literals) were introduced in Python 3.6 and significantly improved in Python 3.12. They provide a concise and readable way to embed expressions inside string literals.

F-strings are prefixed with f and use curly braces {} to evaluate expressions:

# Basic f-string usage
ticker = "AAPL"
price = 150.25
print(f"Stock {ticker} is trading at ${price}")
Stock AAPL is trading at $150.25
# Expressions inside f-strings
shares = 100
print(f"Total value: ${price * shares:,.2f}")
Total value: $15,025.00
# Number formatting
value = 1234567.89
print(f"{value:,.2f}")        # 1,234,567.89 (comma separator, 2 decimals)
print(f"{value:>15,.2f}")     # Right-align in 15 characters
print(f"{value:<15,.2f}")     # Left-align in 15 characters
1,234,567.89
   1,234,567.89
1,234,567.89   
# Percentage formatting
return_rate = 0.0525
print(f"{return_rate:.2%}")   # 5.25%
print(f"{return_rate:.4%}")   # 5.2500%
5.25%
5.2500%
# Scientific notation
small_number = 0.000123
print(f"{small_number:.2e}")  # 1.23e-04
1.23e-04
# Padding and alignment
for ticker in ["AAPL", "GOOGL", "MSFT"]:
    print(f"{ticker:>6}")     # Right-align with 6 character width
  AAPL
 GOOGL
  MSFT
# Date formatting for financial reports
from datetime import datetime
trade_date = datetime(2024, 1, 15)
print(f"Trade Date: {trade_date:%Y-%m-%d}")  # 2024-01-15
print(f"Report Period: {trade_date:%B %Y}")  # January 2024
Trade Date: 2024-01-15
Report Period: January 2024

Lists

The most basic data structure in Python is a list. It is an ordered collection of variables of any type defined with squared brackets []. For example:

x = [1, 2, 3., "hello", True]

Unlike R, MATLAB, and Julia, Python is 0-index based, which means that x[0] is the actual first element of the list x. The length of a list may be obtained via the function len().

len(x)
5

To access the value in a given position of a list, use indexes:

x[1]  # 2nd element of the list x
2

Python allows for negative indexes to be given, e.g., x[-1] returns the last item of a list, x[-2] returns the second last item, and so on:

x[-1]
True
x[-2]
'hello'

Additionally, we can retrieve sublists of a list by using slices, e.g., x[1:3] returns a sublist containing the elements x[1] and x[2]; x[1:] returns a sublist containing all the elements to the right of (and including) x[1]; x[:3] returns a sublist containing all the elements to the left of x[3]. In general, x[a:b], for integers a,b, b > a, returns the ordered sublist from x[a] to x[b-1].

List Comprehensions: In many cases we would like to retrieve a sublist such that the indexes posses a particular property. In the Python jargon this is called "list comprehension". For instance, let's retrieve the sublist of elements which are in even positions of the original list x:

y = [x[i] for i in range(len(x)) if i % 2 == 0]
y
[1, 3.0, True]

We can concatenate lists by using the "+" operator:

[1, 2, 3] + [3, 2, 1]
[1, 2, 3, 3, 2, 1]

NumPy arrays

Python was not designed specificaly for scientific computing, however libraries such as NumPy, started by Travis Oliphant, extend the language data structures so as to deal more easily with vectors, matrices, and the mathematical operations involved.

Note that in Python, 1d numpy arrays (or simply 1d-arrays) are not column vectors or row vectors, they do not have any orientation. If one desires a column vector, then that is actually an $n\times 1$ matrix.

It is also important to differentiate elementwise multiplication * from inner or dot product @ (also np.dot()):

x = np.array([1, 2])
y = np.array([10, 20])
z = x.reshape((len(x), 1))
x * y
array([10, 40])
x @ y
np.int64(50)
z @ np.transpose(z)
array([[1, 2],
       [2, 4]])
z @ z.T
array([[1, 2],
       [2, 4]])

Outer product between two arrays can be done via the function np.outer:

x = np.array([1, 2])
np.outer(x, x)
array([[1, 2],
       [2, 4]])

The number of elements of a numpy array can be retrieved via len:

y = np.array([10, 20])
len(y)
2

Be careful when using len with arrays with more than one dimension! len always return the "size" of the first dimension:

y = np.array([1, 2]).reshape((2, 1))
len(y)
2
len(np.transpose(y))
1
len(y.T)
1

Matrices

A matrix is two-dimensional collection of several variables of the same type.

We can easily create a matrix with np.array:

np.random.seed(42)  # For reproducible results

# generate 5 x 4 numeric matrix
x = np.random.uniform(size=20).reshape((5, 4))
x
array([[0.37454012, 0.95071431, 0.73199394, 0.59865848],
       [0.15601864, 0.15599452, 0.05808361, 0.86617615],
       [0.60111501, 0.70807258, 0.02058449, 0.96990985],
       [0.83244264, 0.21233911, 0.18182497, 0.18340451],
       [0.30424224, 0.52475643, 0.43194502, 0.29122914]])
# we can get the dimensions or number of rows/columns
np.shape(x)
(5, 4)
x.shape
(5, 4)

Identify rows, columns or elements using subscripts:

x[:,3]  # 4th column of matrix (returned as an 1D-array)
array([0.59865848, 0.86617615, 0.96990985, 0.18340451, 0.29122914])
x[2,:]  # 3rd row of matrix (returned as an 1D-array)
array([0.60111501, 0.70807258, 0.02058449, 0.96990985])

Pandas data frames

Pandas is a data analysis library, started by Wes McKinney, whose main data structure is the so called Pandas Data Frames. The basic unit of Pandas dataframes is the Series class. Basically, a dataframe is a collection of column stacked Series objects that share the same "index". In finance, that "index" usually corresponds to time data (seconds, minutes, hours, days, weeks, months, etc).

A Pandas data frame is more general than numpy nd-arrays in the sense that we can attach labels to columns and have them with different data types

import pandas as pd
df = pd.DataFrame(
    {
        'float': [1., 2., 3.],
        'int': [1, 2, 3],
        'datetime': [pd.Timestamp('20180310'), pd.Timestamp('20190310'), pd.Timestamp('20200310')],
        'string': ['foo', 'bar', 'buzz']
    }
)
print(df)
   float  int   datetime string
0    1.0    1 2018-03-10    foo
1    2.0    2 2019-03-10    bar
2    3.0    3 2020-03-10   buzz

Let's check the type of one of the variables:

type(df['float'])
pandas.Series

There are a variety of ways to retrieve the elements of a data frame:

df['float']
0    1.0
1    2.0
2    3.0
Name: float, dtype: float64
df['float'][0]
np.float64(1.0)
df['datetime']
0   2018-03-10
1   2019-03-10
2   2020-03-10
Name: datetime, dtype: datetime64[us]
df['datetime'][2]
Timestamp('2020-03-10 00:00:00')
df.at[1, 'float']
np.float64(2.0)
df.loc[1]
float                       2.0
int                           2
datetime    2019-03-10 00:00:00
string                      bar
Name: 1, dtype: object
df.loc[:, 'float']
0    1.0
1    2.0
2    3.0
Name: float, dtype: float64
df.loc[1, 'float']
np.float64(2.0)

We can also set values provided that they can be castable to the type of the column:

df.at[1, 'float'] = 10
print(df)
   float  int   datetime string
0    1.0    1 2018-03-10    foo
1   10.0    2 2019-03-10    bar
2    3.0    3 2020-03-10   buzz

The variable (column) names can be retrieved via the .columns attribute:

df.columns
Index(['float', 'int', 'datetime', 'string'], dtype='str')

A few other useful methods to inspect dataframes are head() and tail() that show the first and last few rows (observations) in a dataframe:

print(df.head(n=2))
   float  int   datetime string
0    1.0    1 2018-03-10    foo
1   10.0    2 2019-03-10    bar
print(df.tail(n=2))
   float  int   datetime string
1   10.0    2 2019-03-10    bar
2    3.0    3 2020-03-10   buzz

In finance, often times data comes with missing values, usually labeled as "NaN" (not a number) or "NaT" (not a time, for time values). Let's check out a basic example:

import pandas as pd
import numpy as np

df = pd.DataFrame({"name": ['Superman', 'Batman', 'Spiderman'],
                   "toy": [np.nan, 'Batmobile', 'Spiderman toy'],
                   "born": [pd.NaT, pd.Timestamp("1956-06-26"), pd.NaT]})
print(df)
        name            toy       born
0   Superman            NaN        NaT
1     Batman      Batmobile 1956-06-26
2  Spiderman  Spiderman toy        NaT

Now, in case we simply would like to remove the rows (observations) where at least one element is NaN or NaT, we use df.dropna():

print(df.dropna())
     name        toy       born
1  Batman  Batmobile 1956-06-26

In case we would like to remove the columns (variables) where at least one element is missing, we make use of the argument axis:

print(df.dropna(axis='columns'))
        name
0   Superman
1     Batman
2  Spiderman

Many other options are available to deal with NaNs and NaTs, such as specifying which columns to look for missing values:

print(df.dropna(subset=['name', 'born']))
     name        toy       born
1  Batman  Batmobile 1956-06-26

Note that this operations do not happen in place, i.e., the original dataframe is kept intact. In case we would like to perform in place modifications, we use the argument inplace=True:

print(df)
        name            toy       born
0   Superman            NaN        NaT
1     Batman      Batmobile 1956-06-26
2  Spiderman  Spiderman toy        NaT
df.dropna(inplace=True)
print(df)
     name        toy       born
1  Batman  Batmobile 1956-06-26

Data frames in Python are very powerful and versatile. They are commonly used in machine learning where each row is one observation and each column one variable (each variable can be of different types). For financial applications, we mainly deal with multivariate time series, which can be seen as a matrix or data frame, but with some particularities: each row is an observation but in a specific order (properly indexed with dates or times) and each column is of the same time (double).

Plotting

We will make full use of Matplotlib, Seaborn, and Plotly for all our plots :) See the example below for how to plot data stored in a Pandas dataframe with seaborn.

Let's examine this code snippet from matplotlib's documentation page

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook

# Load a numpy record array from yahoo csv data with fields date, open, close,
# volume, adj_close from the mpl-data/example directory. The record array
# stores the date as an np.datetime64 with a day unit ('D') in the date column.
data_file = cbook.get_sample_data('goog.npz', asfileobj=False)
price_data = np.load(data_file)['price_data']
price_data = price_data[-250:]  # get the most recent 250 trading days
type(price_data)
delta1 = np.diff(price_data['adj_close']) / price_data['adj_close'][:-1]

# Plot
volume = (15 * price_data['volume'][:-2] / price_data['volume'][0])**2  # for size
close = 0.003 * price_data['close'][:-2] / 0.003 * price_data['open'][:-2]  # for color
fig, ax = plt.subplots()
ax.scatter(delta1[:-1], delta1[1:], c=close, s=volume, alpha=0.5)
ax.set_xlabel(r'$\Delta_i$', fontsize=15)
ax.set_ylabel(r'$\Delta_{i+1}$', fontsize=15)
ax.set_title('Volume and percent change')
ax.grid(True)
fig.tight_layout()
plt.show()

Matplotlib basically deals with numpy nd-arrays and its subclasses like np.recarray used in the previous example. Seaborn, on the other hand, can deal with Pandas DataFrames too. Let's see a basic example from seaborn's documentation page:

import seaborn as sns
sns.set_theme(style="darkgrid")
# Load an example dataset with long-form data
fmri = sns.load_dataset("fmri")
print(type(fmri))
print(fmri.head())
# Plot the responses for different events and regions
sns.lineplot(x="timepoint", y="signal",
             hue="region", style="event",
             data=fmri)
<class 'pandas.DataFrame'>
  subject  timepoint event    region    signal
0     s13         18  stim  parietal -0.017552
1      s5         14  stim  parietal -0.080883
2     s12         18  stim  parietal -0.081033
3     s11         18  stim  parietal -0.046134
4     s10         18  stim  parietal -0.037970
<Axes: xlabel='timepoint', ylabel='signal'>

Key libraries for finance

We will make use of several key libraries in Python.

Library skfolio

skfolio is a Python library for portfolio optimization and risk management built on top of scikit-learn. It provides a unified framework to create, fine-tune, cross-validate, and stress-test portfolio models using the familiar scikit-learn API.

Key Features:

  • scikit-learn integration: Uses the same fit/predict API pattern
  • Comprehensive risk measures: Variance, CVaR, Maximum Drawdown, CDaR, and more
  • Advanced optimization: Mean-Variance, Maximum Sharpe, Risk Parity, Hierarchical methods
  • Cross-validation: Built-in support for portfolio model validation
  • Factor models: Support for Black-Litterman, Fama-French factors
  • Constraints: Weight, group, cardinality, tracking error, transaction costs

Official Resources:

Example 1: Minimum Variance Portfolio

from sklearn.model_selection import train_test_split  # pip install scikit-learn
from skfolio.datasets import load_sp500_dataset       # pip install skfolio
from skfolio.optimization import MeanRisk
from skfolio.preprocessing import prices_to_returns
import plotly.io as pio
pio.renderers.default = "notebook"

# Load S&P 500 price data (built-in dataset with 20 assets)
prices = load_sp500_dataset()

# Convert prices to returns
X = prices_to_returns(prices)

# Split into training and testing sets
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)

# Create minimum variance portfolio model
model = MeanRisk()  # Default is minimum variance

# Fit the model on training data
model.fit(X_train)

# Display the optimal weights
print("Optimal Weights:")
print(model.weights_)
Optimal Weights:
[2.64898712e-02 2.41659793e-07 7.55934052e-08 1.03008568e-02
 1.33346884e-01 3.09419154e-07 2.28512522e-07 1.88127445e-01
 8.78144210e-08 1.06451771e-01 3.78274254e-02 5.97493951e-07
 1.93800589e-02 1.16900117e-01 7.06296600e-07 1.64477192e-01
 1.21290539e-02 1.00386527e-02 8.77060435e-02 8.68223818e-02]
# Predict (evaluate) on test set
portfolio = model.predict(X_test)

# Display performance metrics
print(f"\nAnnualized Sharpe Ratio: {portfolio.annualized_sharpe_ratio:.4f}")
print(f"Annualized Return: {portfolio.annualized_mean:.2%}")
print(f"Annualized Volatility: {portfolio.annualized_standard_deviation:.2%}")
print(f"Maximum Drawdown: {portfolio.max_drawdown:.2%}")
Annualized Sharpe Ratio: 0.9150
Annualized Return: 13.63%
Annualized Volatility: 14.90%
Maximum Drawdown: 33.58%
# Plot cumulative returns of the portfolio over the test set
(portfolio.returns_df.cumsum() * 100).plot(title="Portfolio Cumulative Uncompounded Returns")
plt.ylabel("%")
plt.show()
# Plot wealth of the portfolio over the test set
(1.0 + portfolio.returns_df).cumprod().plot(title="Portfolio Wealth")
plt.ylabel("$")
plt.show()
# Plot cumulative returns using skfolio's built-in Plotly method
portfolio.plot_cumulative_returns()

Example 2: Maximum Sharpe Ratio Portfolio

from skfolio import RiskMeasure
from skfolio.optimization import ObjectiveFunction, MeanRisk

# Create maximum Sharpe ratio model
model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    risk_measure=RiskMeasure.VARIANCE
)

# Fit and predict
model.fit(X_train)
portfolio = model.predict(X_test)

print(f"Sharpe Ratio: {portfolio.annualized_sharpe_ratio:.4f}")
print(f"Portfolio weights sum: {model.weights_.sum():.4f}")
Sharpe Ratio: 1.0401
Portfolio weights sum: 1.0000

Example 3: Minimum CVaR (Conditional Value at Risk)

# Minimize CVaR at 95% confidence level
model = MeanRisk(
    risk_measure=RiskMeasure.CVAR,
    cvar_beta=0.95  # 95% CVaR confidence level
)

model.fit(X_train)
portfolio = model.predict(X_test)

print(f"CVaR (95%): {portfolio.cvar:.2%}")
print(f"Sharpe Ratio: {portfolio.annualized_sharpe_ratio:.4f}")
CVaR (95%): 2.17%
Sharpe Ratio: 0.8703

Library yfinance

The library yfinance let us query financial instruments data from the Yahoo! Finance platform.

Let's see how to get Apple stock price data from Yahoo! Finance:

import yfinance as yf

# Extract fundamental data
apple = yf.Ticker('AAPL')
info = apple.info
print(f"P/E Ratio: {info['trailingPE']}")
print(f"Market Cap: {info['marketCap']}")
print(f"Dividend Yield: {info['dividendYield']}")
P/E Ratio: 34.96958
Market Cap: 4055304765440
Dividend Yield: 0.38
# Download price data
apple = yf.download("AAPL", start="2017-01-01", end="2017-04-30")

Now, let's inspect the Pandas dataframe apple via methods such as head(), tail(), that show the first and last few observations of the dataframe:

apple.head()
Price Close High Low Open Volume
Ticker AAPL AAPL AAPL AAPL AAPL
Date
2017-01-03 26.770889 26.812377 26.450515 26.690220 115127600
2017-01-04 26.740917 26.853856 26.678687 26.701735 84472400
2017-01-05 26.876909 26.934531 26.692520 26.717874 88774400
2017-01-06 27.176535 27.234156 26.844635 26.916085 127007600
2017-01-09 27.425459 27.526873 27.183450 27.185754 134247600
apple.tail()
Price Close High Low Open Volume
Ticker AAPL AAPL AAPL AAPL AAPL
Date
2017-04-24 33.250469 33.322229 33.143985 33.218061 68537200
2017-04-25 33.456493 33.542142 33.303713 33.312974 75486000
2017-04-26 33.259727 33.472697 33.190285 33.442602 80164800
2017-04-27 33.285194 33.370846 33.174083 33.315289 56985200
2017-04-28 33.252781 33.403248 33.164819 33.354635 83441600

We can also plot the some desired columns of a dataframe using the plot() method:

apple[['High', 'Low', 'Open', 'Close']].plot()
<Axes: xlabel='Date'>

We can also load multiple assets at once:

# Portfolio of assets
tickers = ['AAPL', 'MSFT', 'JPM', 'GS']
portfolio_data = yf.download(tickers, start='2020-01-01', end='2023-12-31', auto_adjust=False)['Adj Close']
portfolio_data.head()
Ticker AAPL GS JPM MSFT
Date
2020-01-02 72.468246 203.182617 119.036438 152.505661
2020-01-03 71.763702 200.806732 117.465561 150.606720
2020-01-06 72.335548 202.861786 117.372162 150.995987
2020-01-07 71.995354 204.197144 115.376762 149.619232
2020-01-08 73.153503 206.165451 116.276825 152.002457

Library empyrical

empyrical is an open source library developed by Quantopian Inc. It's widely used by practitioners to compute common risk and performance measures.

from empyrical import max_drawdown, roll_max_drawdown, cum_returns, omega_ratio, sharpe_ratio

# create a synthetic array of returns
returns = np.array([.01, .02, .03, -.4, -.06, -.02])
max_drawdown(returns)  # calculate the maximum drawdown
-0.4472800000000001
roll_max_drawdown(returns, window=3) # calculate the maximum drawdown in a rolling window fashion
array([ 0.     , -0.4    , -0.436  , -0.44728])
cum_returns(returns) # calculate the cumulative returns
array([ 0.01      ,  0.0302    ,  0.061106  , -0.3633364 , -0.40153622,
       -0.41350549])
sharpe_ratio(returns) # calculate the Sharpe ratio
-6.7377339531573535
# Note: Some empyrical functions require annualization factors
sharpe_ratio(returns, risk_free=0, annualization=252)  # 252 trading days
-6.7377339531573535

Library riskparityportfolio

riskparityportfolio is a library to design risk parity portfolios, a different approach to investment that is primarily used to control how much risk goes into each asset.

import riskparityportfolio as rpp  # pip install riskparityportfolio (dependencies: numpy, jax, quadprog, pybind, and tqdm)
import numpy as np
import matplotlib.pyplot as plt

cov_matrix = np.vstack((np.array((1.0000, 0.0015, -0.0119)),
                        np.array((0.0015, 1.0000, -0.0308)),
                        np.array((-0.0119, -0.0308, 1.0000))))
risk_budget_vector = np.array((0.1594, 0.0126, 0.8280))
w = rpp.vanilla.design(cov_matrix, risk_budget_vector)
plt.bar(["stock " + item for item in ["A", "B", "C"]], w)
plt.xlabel("")
plt.ylabel("portfolio weight")
plt.show()

Python Scripts and Jupyter Notebooks

Python Scripts

One simple way to use Python is by typing the commands in the IPython terminal one by one. However, this quickly becomes unscalable and it is necessary to write scripts. You can use your favourite text editor to create Python scripts.

Jupyter Notebook/Lab

Jupyter notebooks enable you to write report-like documents containing code, documentation, mathematical equations, figures, and so on. This document is an example.

To explore further

Check out Awesome Quant for a list of curated packages relevant to financial applications.