Primer on Python for Finance
Daniel P. Palomar (2025). Portfolio Optimization: Theory and Application. Cambridge University Press.
Last update: February 06, 2026
Contributors:
Introduction¶
What is Python?¶
- Python is a general purpose, high-level, interpreted language that has become one of the de facto tools for data science, data analysis, machine learning, and finance. For instance, Python has topped the IEEE rank of top programming languages for many years in a row.
- Python was created by Guido van Rossum in 1991. Guido acted as Python's Benevolent Dictator for Life (BDFL) and recently retired as a Principal Engineer from Dropbox. Python's main philosophy focuses on code readability, i.e., it provides a simple, easy-to-follow grammar, which often translates into rapid code development.
- Python is an actively developed open source language. The Python Software Foundation (PSF) holds the intellectual property rights and protects the trademarks associated with Python. Agencies, foundations, private companies, and non-profit organizations support the development of Python and many of its open source libraries. Just to name a few, the Moore Foundation, the NumFOCUS organization, Microsoft, J.P. Morgan, and so on, have been funding the development of Python for many years.
- Python is distributed by a variety of sources of which we recommend the Anaconda Python primarily due to its straightforward installation procedure.
- Useful Python links:
- Anaconda
- Searching packages: PYPI
- Python documentation
- JupyterLab is the de facto environment for data analysis
- Learning: LearnPython, Python for Beginners, Learn Python 3, The Hitchhiker's Guide to Python, NumPy basics by Andrej Karpathy, Matplotlib basics, Seaborn basics, Pandas Crash Course by Datacamp,
- Stack Overflow
- Python homepage
- Other resources: Book: Python for Finance, mastering data-driven finance
Python vs R¶
Let's not even get started :) Both are great! Learn both! You'll have to use whatever your boss/advisor/team needs. But don't forget to know a bit of C++ too :)
Installation¶
First, install the Python distribution, for example, from Anaconda (in macOS and Linux you can also use brew install python).
Then, install your favorite code editor or IDE. Some examples are:
- JupyterLab, which is the de facto IDE. It can be installed from a terminal window with
pip install jupyterlab(in macOS and Linux you can also usebrew install jupyterlab) and there is also a desktop version of Jupyter Lab. - PyCharm
- VS Code
- spyder
To get started coding, start your code editor or IDE. For example, with JupyterLab either click the app or from a terminal (or cmd on Windows) simply type jupyter lab (and a browser window will pop up). A Jupyter notebook is an environment where you can write code and interactively evaluate its output. This feature is very convenient for exploratory analysis.
Now you are ready to start using Python from within JupyterLab.
Libraries¶
To see the versions of Python and the installed libraries just type !pip list or !conda list on a Jupyter notebook
and press SHIFT+ENTER. Alternatively, type pip list or conda list in a terminal window and press ENTER.
To see the version of a specific library use import library_name; print(library_name.__version__).
As time progresses, you will have to install different libraries from PYPI or Conda with the command
pip install library_name or conda install library_name. Note that you can execute these commands
from within a Jupyter notebook by prepending them with an exclamation mark (!).
After installing a library, it needs to be imported before it
can be used with the command import library_name:
# we need to import it first and then we can use it:
import numpy # to install do: pip install numpy
x = [1, 2, 3]
y = numpy.mean(x)
y
np.float64(2.0)
It is common to use shortcuts for the names of the imported libraries:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 5, 100)
y = (x - np.pi) * (x - 1.618)
plt.plot(x, y, label = "2nd order degree polynomial")
plt.legend()
plt.show()
Good style¶
The are several Python style guides:
Style Guide for Python Code: The Python Enhancement Proposal (PEP) 8 is a widely accepted document that outlines good programming practices for Python. It was created by Guido van Rossum, Barry Warsaw, and Alyssa Coghlan, and it evolves over time as new conventions are identified and old ones become obsolete. The guide emphasizes that code is read much more often than it is written, and therefore, readability and consistency are crucial. It provides guidelines on various aspects of coding in Python, including naming conventions, indentation, and use of whitespace, among others.
Google style guide: Google also has its own style guide for Python, which includes a list of dos and don'ts for Python programs. It emphasizes the importance of using descriptive names for public APIs, making modules importable, and using the right style for module, function, method docstrings, and inline comments. It also recommends using tools like pylint for finding bugs and style problems in Python source code.
The Hitchhiker's Guide to Python: It is another resource that also recommends following PEP 8. It highlights the importance of readability and provides some common Python idioms[6].
Remember, while these style guides provide useful guidelines, they are not absolute rules. They are intended to improve the readability and consistency of your code, but there may be instances where it makes sense to deviate from the guidelines. As PEP 8 itself says, "A Foolish Consistency is the Hobgoblin of Little Minds". When in doubt, use your best judgment and consider the readability and maintainability of your code.
Variables: lists, dicts, arrays, and data frames¶
In Python, we can easily assign a value to a variable or object with =
(if the variable does not exist it will be created):
x = "Hello"
x
'Hello'
We can combine several elements with lists:
y = ["Hello", "everyone"]
y
['Hello', 'everyone']
Note that elements in a list need not have the same datatype (we'll see the datatypes in a few minutes)
y = [1, "hello", 2., "everyone"]
y
[1, 'hello', 2.0, 'everyone']
A dictionary, or simply "dict", is a data structure that allows mappings between keywords and values.
There are many ways to create a dict, the simplest one is just to use curly brackets {} as follows:
x = {'a': 1, 'b': 2, 'c': 3}
x
{'a': 1, 'b': 2, 'c': 3}
Another way is to explicitly specify the keywords and values:
x = dict(a = 1, b = 2, c = 3)
x
{'a': 1, 'b': 2, 'c': 3}
A dict can store different data types for different keywords:
x = dict(a = '1', b = 2, c = 3)
x
{'a': '1', 'b': 2, 'c': 3}
The usual way to query a value from a dict is to pass the desired keyword:
x['a']
'1'
We can also modify the contents of dicts or add new entries:
x = dict(a = 1, b = 2, c = 3)
x['a'] = 2
x
{'a': 2, 'b': 2, 'c': 3}
x['d'] = 10
x
{'a': 2, 'b': 2, 'c': 3, 'd': 10}
Sets in Python are a collection of unordered unique elements. The main purposes of sets are to verify membership, remove duplicate elements from a sequence, and computing standard math operations on sets.
We can create sets from lists as follows:
x = set([1, 2, 3, 1])
y = set([2, 4, 5, 3])
x.intersection(y)
{2, 3}
x.difference(y)
{1}
x.symmetric_difference(y)
{1, 4, 5}
A useful command is ?variable. It gives you various information about the variable, i.e.,
type, dimensions, contents, etc. Note that this is a feature only in
IPython and JupyterLab.
Another useful feature in Python is slicing. It is specially good for arrays or lists of large
dimensions, showing you the first and last n elements, respectively.
x = np.arange(1000)
x[:10]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[-10:]
array([990, 991, 992, 993, 994, 995, 996, 997, 998, 999])
It is important to keep in mind that in Python almost everything is done through functions or methods
of all sorts such as max(), min(), arange(), linspace(), and so on.
Data types¶
Operators in Python: arithmetic operators include +, -, *, /, ** for addition, subtraction,
multiplication, division, and exponentiation. Binary comparison operators are >, >=, ==, !=.
Boolean operators are and, or, and their bitwise versions are &, |.
Python has a wide variety of data types including scalars (float, double, integers, complex numbers), strings, lists, tuples, dictionaries (dicts), sets, nd-arrays, and data frames.
Scalars and strings¶
Scalars are basically float and integers, for example:
x = 1
type(x)
int
x = 1.1
type(x)
float
Can you think about why Python gives the following answer to the sum 3.3 - (1.1 + 2.2)?
x = 3.3 - (1.1 + 2.2)
x
-4.440892098500626e-16
Try the same thing in R and MATLAB. what do you see? Is it different from what Python computed?
What is the binary expansion of 0.333333...? How do computers represent anything? :)
x = "Hello ELEC 3180"
x
'Hello ELEC 3180'
Modern String Formatting with F-Strings
F-strings (formatted string literals) were introduced in Python 3.6 and significantly improved in Python 3.12. They provide a concise and readable way to embed expressions inside string literals.
F-strings are prefixed with f and use curly braces {} to evaluate expressions:
# Basic f-string usage
ticker = "AAPL"
price = 150.25
print(f"Stock {ticker} is trading at ${price}")
Stock AAPL is trading at $150.25
# Expressions inside f-strings
shares = 100
print(f"Total value: ${price * shares:,.2f}")
Total value: $15,025.00
# Number formatting
value = 1234567.89
print(f"{value:,.2f}") # 1,234,567.89 (comma separator, 2 decimals)
print(f"{value:>15,.2f}") # Right-align in 15 characters
print(f"{value:<15,.2f}") # Left-align in 15 characters
1,234,567.89 1,234,567.89 1,234,567.89
# Percentage formatting
return_rate = 0.0525
print(f"{return_rate:.2%}") # 5.25%
print(f"{return_rate:.4%}") # 5.2500%
5.25% 5.2500%
# Scientific notation
small_number = 0.000123
print(f"{small_number:.2e}") # 1.23e-04
1.23e-04
# Padding and alignment
for ticker in ["AAPL", "GOOGL", "MSFT"]:
print(f"{ticker:>6}") # Right-align with 6 character width
AAPL GOOGL MSFT
# Date formatting for financial reports
from datetime import datetime
trade_date = datetime(2024, 1, 15)
print(f"Trade Date: {trade_date:%Y-%m-%d}") # 2024-01-15
print(f"Report Period: {trade_date:%B %Y}") # January 2024
Trade Date: 2024-01-15 Report Period: January 2024
Lists¶
The most basic data structure in Python is a list. It is an ordered collection of variables of any type
defined with squared brackets []. For example:
x = [1, 2, 3., "hello", True]
Unlike R, MATLAB, and Julia,
Python is 0-index based, which means that x[0] is the actual first element of the list x.
The length of a list may be obtained via the function len().
len(x)
5
To access the value in a given position of a list, use indexes:
x[1] # 2nd element of the list x
2
Python allows for negative indexes to be given, e.g., x[-1] returns the last item of a list, x[-2]
returns the second last item, and so on:
x[-1]
True
x[-2]
'hello'
Additionally, we can retrieve sublists of a list by using slices, e.g., x[1:3] returns a sublist
containing the elements x[1] and x[2]; x[1:] returns a sublist containing all the elements
to the right of (and including) x[1]; x[:3] returns a sublist containing all the elements
to the left of x[3]. In general, x[a:b], for integers a,b, b > a,
returns the ordered sublist from x[a] to x[b-1].
List Comprehensions: In many cases we would like to retrieve a sublist such that the indexes posses a particular property.
In the Python jargon this is called "list comprehension". For instance, let's retrieve the sublist of elements
which are in even positions of the original list x:
y = [x[i] for i in range(len(x)) if i % 2 == 0]
y
[1, 3.0, True]
We can concatenate lists by using the "+" operator:
[1, 2, 3] + [3, 2, 1]
[1, 2, 3, 3, 2, 1]
NumPy arrays¶
Python was not designed specificaly for scientific computing, however libraries such as NumPy, started by Travis Oliphant, extend the language data structures so as to deal more easily with vectors, matrices, and the mathematical operations involved.
Note that in Python, 1d numpy arrays (or simply 1d-arrays) are not column vectors or row vectors, they do not have any orientation. If one desires a column vector, then that is actually an $n\times 1$ matrix.
It is also important to differentiate elementwise multiplication * from inner or dot product @ (also np.dot()):
x = np.array([1, 2])
y = np.array([10, 20])
z = x.reshape((len(x), 1))
x * y
array([10, 40])
x @ y
np.int64(50)
z @ np.transpose(z)
array([[1, 2],
[2, 4]])
z @ z.T
array([[1, 2],
[2, 4]])
Outer product between two arrays can be done via the function np.outer:
x = np.array([1, 2])
np.outer(x, x)
array([[1, 2],
[2, 4]])
The number of elements of a numpy array can be retrieved via len:
y = np.array([10, 20])
len(y)
2
Be careful when using len with arrays with more than one dimension! len always return the
"size" of the first dimension:
y = np.array([1, 2]).reshape((2, 1))
len(y)
2
len(np.transpose(y))
1
len(y.T)
1
Matrices¶
A matrix is two-dimensional collection of several variables of the same type.
We can easily create a matrix with np.array:
np.random.seed(42) # For reproducible results
# generate 5 x 4 numeric matrix
x = np.random.uniform(size=20).reshape((5, 4))
x
array([[0.37454012, 0.95071431, 0.73199394, 0.59865848],
[0.15601864, 0.15599452, 0.05808361, 0.86617615],
[0.60111501, 0.70807258, 0.02058449, 0.96990985],
[0.83244264, 0.21233911, 0.18182497, 0.18340451],
[0.30424224, 0.52475643, 0.43194502, 0.29122914]])
# we can get the dimensions or number of rows/columns
np.shape(x)
(5, 4)
x.shape
(5, 4)
Identify rows, columns or elements using subscripts:
x[:,3] # 4th column of matrix (returned as an 1D-array)
array([0.59865848, 0.86617615, 0.96990985, 0.18340451, 0.29122914])
x[2,:] # 3rd row of matrix (returned as an 1D-array)
array([0.60111501, 0.70807258, 0.02058449, 0.96990985])
Pandas data frames¶
Pandas is a data analysis library, started by Wes McKinney, whose main data structure is the so called Pandas Data Frames. The basic unit of Pandas dataframes is the Series class. Basically, a dataframe is a collection of column stacked Series objects that share the same "index". In finance, that "index" usually corresponds to time data (seconds, minutes, hours, days, weeks, months, etc).
A Pandas data frame is more general than numpy nd-arrays in the sense that we can attach labels to columns and have them with different data types
import pandas as pd
df = pd.DataFrame(
{
'float': [1., 2., 3.],
'int': [1, 2, 3],
'datetime': [pd.Timestamp('20180310'), pd.Timestamp('20190310'), pd.Timestamp('20200310')],
'string': ['foo', 'bar', 'buzz']
}
)
print(df)
float int datetime string 0 1.0 1 2018-03-10 foo 1 2.0 2 2019-03-10 bar 2 3.0 3 2020-03-10 buzz
Let's check the type of one of the variables:
type(df['float'])
pandas.Series
There are a variety of ways to retrieve the elements of a data frame:
df['float']
0 1.0 1 2.0 2 3.0 Name: float, dtype: float64
df['float'][0]
np.float64(1.0)
df['datetime']
0 2018-03-10 1 2019-03-10 2 2020-03-10 Name: datetime, dtype: datetime64[us]
df['datetime'][2]
Timestamp('2020-03-10 00:00:00')
df.at[1, 'float']
np.float64(2.0)
df.loc[1]
float 2.0 int 2 datetime 2019-03-10 00:00:00 string bar Name: 1, dtype: object
df.loc[:, 'float']
0 1.0 1 2.0 2 3.0 Name: float, dtype: float64
df.loc[1, 'float']
np.float64(2.0)
We can also set values provided that they can be castable to the type of the column:
df.at[1, 'float'] = 10
print(df)
float int datetime string 0 1.0 1 2018-03-10 foo 1 10.0 2 2019-03-10 bar 2 3.0 3 2020-03-10 buzz
The variable (column) names can be retrieved via the .columns attribute:
df.columns
Index(['float', 'int', 'datetime', 'string'], dtype='str')
A few other useful methods to inspect dataframes are head() and tail() that show the first and last
few rows (observations) in a dataframe:
print(df.head(n=2))
float int datetime string 0 1.0 1 2018-03-10 foo 1 10.0 2 2019-03-10 bar
print(df.tail(n=2))
float int datetime string 1 10.0 2 2019-03-10 bar 2 3.0 3 2020-03-10 buzz
In finance, often times data comes with missing values, usually labeled as "NaN" (not a number) or "NaT" (not a time, for time values). Let's check out a basic example:
import pandas as pd
import numpy as np
df = pd.DataFrame({"name": ['Superman', 'Batman', 'Spiderman'],
"toy": [np.nan, 'Batmobile', 'Spiderman toy'],
"born": [pd.NaT, pd.Timestamp("1956-06-26"), pd.NaT]})
print(df)
name toy born 0 Superman NaN NaT 1 Batman Batmobile 1956-06-26 2 Spiderman Spiderman toy NaT
Now, in case we simply would like to remove the rows (observations) where at least one element is NaN
or NaT, we use df.dropna():
print(df.dropna())
name toy born 1 Batman Batmobile 1956-06-26
In case we would like to remove the columns (variables) where at least one element is missing,
we make use of the argument axis:
print(df.dropna(axis='columns'))
name 0 Superman 1 Batman 2 Spiderman
Many other options are available to deal with NaNs and NaTs, such as specifying which columns to look for missing values:
print(df.dropna(subset=['name', 'born']))
name toy born 1 Batman Batmobile 1956-06-26
Note that this operations do not happen in place, i.e., the original dataframe is kept intact.
In case we would like to perform in place modifications, we use the argument inplace=True:
print(df)
name toy born 0 Superman NaN NaT 1 Batman Batmobile 1956-06-26 2 Spiderman Spiderman toy NaT
df.dropna(inplace=True)
print(df)
name toy born 1 Batman Batmobile 1956-06-26
Data frames in Python are very powerful and versatile. They are commonly used in machine learning where each row is one observation and each column one variable (each variable can be of different types). For financial applications, we mainly deal with multivariate time series, which can be seen as a matrix or data frame, but with some particularities: each row is an observation but in a specific order (properly indexed with dates or times) and each column is of the same time (double).
Plotting¶
We will make full use of Matplotlib, Seaborn,
and Plotly
for all our plots :) See the example below for how to plot data stored in a Pandas dataframe
with seaborn.
Let's examine this code snippet from matplotlib's documentation page
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
# Load a numpy record array from yahoo csv data with fields date, open, close,
# volume, adj_close from the mpl-data/example directory. The record array
# stores the date as an np.datetime64 with a day unit ('D') in the date column.
data_file = cbook.get_sample_data('goog.npz', asfileobj=False)
price_data = np.load(data_file)['price_data']
price_data = price_data[-250:] # get the most recent 250 trading days
type(price_data)
delta1 = np.diff(price_data['adj_close']) / price_data['adj_close'][:-1]
# Plot
volume = (15 * price_data['volume'][:-2] / price_data['volume'][0])**2 # for size
close = 0.003 * price_data['close'][:-2] / 0.003 * price_data['open'][:-2] # for color
fig, ax = plt.subplots()
ax.scatter(delta1[:-1], delta1[1:], c=close, s=volume, alpha=0.5)
ax.set_xlabel(r'$\Delta_i$', fontsize=15)
ax.set_ylabel(r'$\Delta_{i+1}$', fontsize=15)
ax.set_title('Volume and percent change')
ax.grid(True)
fig.tight_layout()
plt.show()
Matplotlib basically deals with numpy nd-arrays and its subclasses like np.recarray used in the previous
example. Seaborn, on the other hand, can deal with Pandas DataFrames too. Let's see a basic example from
seaborn's documentation page:
import seaborn as sns
sns.set_theme(style="darkgrid")
# Load an example dataset with long-form data
fmri = sns.load_dataset("fmri")
print(type(fmri))
print(fmri.head())
# Plot the responses for different events and regions
sns.lineplot(x="timepoint", y="signal",
hue="region", style="event",
data=fmri)
<class 'pandas.DataFrame'> subject timepoint event region signal 0 s13 18 stim parietal -0.017552 1 s5 14 stim parietal -0.080883 2 s12 18 stim parietal -0.081033 3 s11 18 stim parietal -0.046134 4 s10 18 stim parietal -0.037970
<Axes: xlabel='timepoint', ylabel='signal'>
Key libraries for finance¶
We will make use of several key libraries in Python.
Library skfolio¶
skfolio is a Python library for portfolio optimization and risk management built on top of scikit-learn. It provides a unified framework to create, fine-tune, cross-validate, and stress-test portfolio models using the familiar scikit-learn API.
Key Features:
- scikit-learn integration: Uses the same
fit/predictAPI pattern - Comprehensive risk measures: Variance, CVaR, Maximum Drawdown, CDaR, and more
- Advanced optimization: Mean-Variance, Maximum Sharpe, Risk Parity, Hierarchical methods
- Cross-validation: Built-in support for portfolio model validation
- Factor models: Support for Black-Litterman, Fama-French factors
- Constraints: Weight, group, cardinality, tracking error, transaction costs
Official Resources:
- Documentation: https://skfolio.org
- GitHub: https://github.com/skfolio/skfolio
- Examples: https://skfolio.org/auto_examples/
Example 1: Minimum Variance Portfolio¶
from sklearn.model_selection import train_test_split # pip install scikit-learn
from skfolio.datasets import load_sp500_dataset # pip install skfolio
from skfolio.optimization import MeanRisk
from skfolio.preprocessing import prices_to_returns
import plotly.io as pio
pio.renderers.default = "notebook"
# Load S&P 500 price data (built-in dataset with 20 assets)
prices = load_sp500_dataset()
# Convert prices to returns
X = prices_to_returns(prices)
# Split into training and testing sets
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)
# Create minimum variance portfolio model
model = MeanRisk() # Default is minimum variance
# Fit the model on training data
model.fit(X_train)
# Display the optimal weights
print("Optimal Weights:")
print(model.weights_)
Optimal Weights: [2.64898712e-02 2.41659793e-07 7.55934052e-08 1.03008568e-02 1.33346884e-01 3.09419154e-07 2.28512522e-07 1.88127445e-01 8.78144210e-08 1.06451771e-01 3.78274254e-02 5.97493951e-07 1.93800589e-02 1.16900117e-01 7.06296600e-07 1.64477192e-01 1.21290539e-02 1.00386527e-02 8.77060435e-02 8.68223818e-02]
# Predict (evaluate) on test set
portfolio = model.predict(X_test)
# Display performance metrics
print(f"\nAnnualized Sharpe Ratio: {portfolio.annualized_sharpe_ratio:.4f}")
print(f"Annualized Return: {portfolio.annualized_mean:.2%}")
print(f"Annualized Volatility: {portfolio.annualized_standard_deviation:.2%}")
print(f"Maximum Drawdown: {portfolio.max_drawdown:.2%}")
Annualized Sharpe Ratio: 0.9150 Annualized Return: 13.63% Annualized Volatility: 14.90% Maximum Drawdown: 33.58%
# Plot cumulative returns of the portfolio over the test set
(portfolio.returns_df.cumsum() * 100).plot(title="Portfolio Cumulative Uncompounded Returns")
plt.ylabel("%")
plt.show()
# Plot wealth of the portfolio over the test set
(1.0 + portfolio.returns_df).cumprod().plot(title="Portfolio Wealth")
plt.ylabel("$")
plt.show()
# Plot cumulative returns using skfolio's built-in Plotly method
portfolio.plot_cumulative_returns()
Example 2: Maximum Sharpe Ratio Portfolio¶
from skfolio import RiskMeasure
from skfolio.optimization import ObjectiveFunction, MeanRisk
# Create maximum Sharpe ratio model
model = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.VARIANCE
)
# Fit and predict
model.fit(X_train)
portfolio = model.predict(X_test)
print(f"Sharpe Ratio: {portfolio.annualized_sharpe_ratio:.4f}")
print(f"Portfolio weights sum: {model.weights_.sum():.4f}")
Sharpe Ratio: 1.0401 Portfolio weights sum: 1.0000
Example 3: Minimum CVaR (Conditional Value at Risk)¶
# Minimize CVaR at 95% confidence level
model = MeanRisk(
risk_measure=RiskMeasure.CVAR,
cvar_beta=0.95 # 95% CVaR confidence level
)
model.fit(X_train)
portfolio = model.predict(X_test)
print(f"CVaR (95%): {portfolio.cvar:.2%}")
print(f"Sharpe Ratio: {portfolio.annualized_sharpe_ratio:.4f}")
CVaR (95%): 2.17% Sharpe Ratio: 0.8703
Library yfinance¶
The library yfinance let us query financial instruments data from the Yahoo! Finance platform.
Let's see how to get Apple stock price data from Yahoo! Finance:
import yfinance as yf
# Extract fundamental data
apple = yf.Ticker('AAPL')
info = apple.info
print(f"P/E Ratio: {info['trailingPE']}")
print(f"Market Cap: {info['marketCap']}")
print(f"Dividend Yield: {info['dividendYield']}")
P/E Ratio: 34.96958 Market Cap: 4055304765440 Dividend Yield: 0.38
# Download price data
apple = yf.download("AAPL", start="2017-01-01", end="2017-04-30")
Now, let's inspect the Pandas dataframe apple via methods such as head(), tail(), that show
the first and last few observations of the dataframe:
apple.head()
| Price | Close | High | Low | Open | Volume |
|---|---|---|---|---|---|
| Ticker | AAPL | AAPL | AAPL | AAPL | AAPL |
| Date | |||||
| 2017-01-03 | 26.770889 | 26.812377 | 26.450515 | 26.690220 | 115127600 |
| 2017-01-04 | 26.740917 | 26.853856 | 26.678687 | 26.701735 | 84472400 |
| 2017-01-05 | 26.876909 | 26.934531 | 26.692520 | 26.717874 | 88774400 |
| 2017-01-06 | 27.176535 | 27.234156 | 26.844635 | 26.916085 | 127007600 |
| 2017-01-09 | 27.425459 | 27.526873 | 27.183450 | 27.185754 | 134247600 |
apple.tail()
| Price | Close | High | Low | Open | Volume |
|---|---|---|---|---|---|
| Ticker | AAPL | AAPL | AAPL | AAPL | AAPL |
| Date | |||||
| 2017-04-24 | 33.250469 | 33.322229 | 33.143985 | 33.218061 | 68537200 |
| 2017-04-25 | 33.456493 | 33.542142 | 33.303713 | 33.312974 | 75486000 |
| 2017-04-26 | 33.259727 | 33.472697 | 33.190285 | 33.442602 | 80164800 |
| 2017-04-27 | 33.285194 | 33.370846 | 33.174083 | 33.315289 | 56985200 |
| 2017-04-28 | 33.252781 | 33.403248 | 33.164819 | 33.354635 | 83441600 |
We can also plot the some desired columns of a dataframe using the plot() method:
apple[['High', 'Low', 'Open', 'Close']].plot()
<Axes: xlabel='Date'>
We can also load multiple assets at once:
# Portfolio of assets
tickers = ['AAPL', 'MSFT', 'JPM', 'GS']
portfolio_data = yf.download(tickers, start='2020-01-01', end='2023-12-31', auto_adjust=False)['Adj Close']
portfolio_data.head()
| Ticker | AAPL | GS | JPM | MSFT |
|---|---|---|---|---|
| Date | ||||
| 2020-01-02 | 72.468246 | 203.182617 | 119.036438 | 152.505661 |
| 2020-01-03 | 71.763702 | 200.806732 | 117.465561 | 150.606720 |
| 2020-01-06 | 72.335548 | 202.861786 | 117.372162 | 150.995987 |
| 2020-01-07 | 71.995354 | 204.197144 | 115.376762 | 149.619232 |
| 2020-01-08 | 73.153503 | 206.165451 | 116.276825 | 152.002457 |
Library empyrical¶
empyrical is an open source library developed by Quantopian Inc. It's widely used by practitioners to compute common risk and performance measures.
from empyrical import max_drawdown, roll_max_drawdown, cum_returns, omega_ratio, sharpe_ratio
# create a synthetic array of returns
returns = np.array([.01, .02, .03, -.4, -.06, -.02])
max_drawdown(returns) # calculate the maximum drawdown
-0.4472800000000001
roll_max_drawdown(returns, window=3) # calculate the maximum drawdown in a rolling window fashion
array([ 0. , -0.4 , -0.436 , -0.44728])
cum_returns(returns) # calculate the cumulative returns
array([ 0.01 , 0.0302 , 0.061106 , -0.3633364 , -0.40153622,
-0.41350549])
sharpe_ratio(returns) # calculate the Sharpe ratio
-6.7377339531573535
# Note: Some empyrical functions require annualization factors
sharpe_ratio(returns, risk_free=0, annualization=252) # 252 trading days
-6.7377339531573535
Library riskparityportfolio¶
riskparityportfolio is a library to design risk parity portfolios, a different approach to investment that is primarily used to control how much risk goes into each asset.
import riskparityportfolio as rpp # pip install riskparityportfolio (dependencies: numpy, jax, quadprog, pybind, and tqdm)
import numpy as np
import matplotlib.pyplot as plt
cov_matrix = np.vstack((np.array((1.0000, 0.0015, -0.0119)),
np.array((0.0015, 1.0000, -0.0308)),
np.array((-0.0119, -0.0308, 1.0000))))
risk_budget_vector = np.array((0.1594, 0.0126, 0.8280))
w = rpp.vanilla.design(cov_matrix, risk_budget_vector)
plt.bar(["stock " + item for item in ["A", "B", "C"]], w)
plt.xlabel("")
plt.ylabel("portfolio weight")
plt.show()
Python Scripts and Jupyter Notebooks¶
Python Scripts¶
One simple way to use Python is by typing the commands in the IPython terminal one by one. However, this quickly becomes unscalable and it is necessary to write scripts. You can use your favourite text editor to create Python scripts.
Jupyter Notebook/Lab¶
Jupyter notebooks enable you to write report-like documents containing code, documentation, mathematical equations, figures, and so on. This document is an example.
To explore further¶
Check out Awesome Quant for a list of curated packages relevant to financial applications.