John Nash, the author of many of the optimization packages (not to be confused with the main character of the movie A Beautiful Mind), has an interesting and insightful blog. In particular, the entry on choosing which method to use for “optimizing” functions is recommended as a guide on how to navigate the overwhelming amount of solvers in R.
General solvers
General solvers can deal with arbitrary nonlinear optimization problems, at the expense of perhaps slow convergence and risk of numerical issues.
Package stats
Package stats (base R package installed by default) offers several general purpose optimization routines.
optimize(): For one-dimensional unconstrained function optimization over an interval (for one-dimensional root finding use uniroot()).
f <-function(x) exp(-0.5*x) *sin(10*pi*x)f(0.5)
[1] 4.768779e-16
result <-optimize(f, interval =c(0, 1), tol =0.0001)result
$minimum
[1] 0.3494978
$objective
[1] -0.8395633
str(result)
List of 2
$ minimum : num 0.349
$ objective: num -0.84
# let's plot somethingcurve(f, 0, 1, n =200, col =4); grid()points(result$minimum, result$objective, pch =20, col ="red", cex =1.5)
Nelder-Mead: relatively robust method (default) that does not require derivatives (function nmkb() of package dfoptim is probably better)
CG: low-memory optimizer for unconstrained problems with high dimensions (albeit generally not satisfactory, better use package Rcgmin)
BFGS: simple unconstrained quasi-Newton method (package Rvmmin is probably better)
L-BFGS-B: modest-memory optimizer for bounds-constrained problems
SANN: simulated annealing method
Brent: for one-dimensional problems (it actually calls optimize())
This example does a least squares fitting: \[
\begin{array}{ll}
\underset{a,b}{\textsf{minimize}} & \sum_{i=1}^{m} \left(y_i - (a+bx_i)\right)^2
\end{array}
\]
# data points to be fitteddat <-data.frame(x =c(1,2,3,4,5,6), y =c(1,3,5,6,8,12))# squared l2-norm error of a linear fitting y ~ par[1] + par[2]*xf <-function(param, data) sum((param[1] + param[2]*data$x - data$y)^2)# call solver (with initial value c(0, 1) and default method = "Nelder-Mead")result <-optim(par =c(0, 1), f, data = dat)str(result)
List of 5
$ par : num [1:2] -1.27 2.03
$ value : num 2.82
$ counts : Named int [1:2] 89 NA
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence: int 0
$ message : NULL
# plot linear regressionplot(y ~ x, data = dat, main ="Least square regression")abline(a = result$par[1], b = result$par[2], col ="red")
# compare against the built in linear regression in Rlm(y ~ x, data = dat)
Call:
lm(formula = y ~ x, data = dat)
Coefficients:
(Intercept) x
-1.267 2.029
The next example illustrates the use of the gradient with the famous Rosenbrock banana function: \[
f(\mathbf{x}) = 100(x_2-x_1^2)^2 + (1-x_1)^2
\] with gradient \[
\nabla f(\mathbf{x}) = \left( \begin{array}{c}
-400x_1(x_2-x_1^2) -2(1-x_1)\\
200(x_2-x_1^2)
\end{array}\right)
\] with the unconstrained minimization problem \[
\begin{array}{ll}
\underset{\mathbf{x}\in \mathbf{R}^2}{\textsf{minimize}} & 100(x_2-x_1^2)^2 + (1-x_1)^2
\end{array}
\]
List of 5
$ par : num [1:2] 1 1
$ value : num 8.83e-08
$ counts : Named int [1:2] 195 NA
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence: int 0
$ message : NULL
res <-optim(c(-1.2, 1), f_banana, gr_banana, method ="BFGS")str(res)
List of 5
$ par : num [1:2] 1 1
$ value : num 9.59e-18
$ counts : Named int [1:2] 110 43
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence: int 0
$ message : NULL
List of 5
$ par : num [1:25] 2 2 2 2 2 2 2 2 2 2 ...
$ value : num 368
$ counts : Named int [1:2] 6 6
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence: int 0
$ message : chr "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
This example uses the simulated annealing method (for global optimization):
# "wild" function , global minimum at about -15.81515f_wild <-function (x)10*sin(0.3*x)*sin(1.3*x^2) +0.00001*x^4+0.2*x+80plot(f_wild, -50, 50, n =1000, main ="optim() minimizing 'wild function'")res <-optim(50, f_wild, method ="SANN",control =list(maxit =20000, temp =20, parscale =20))str(res)
List of 5
$ par : num -15.8
$ value : num 67.5
$ counts : Named int [1:2] 20000 NA
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence: int 0
$ message : NULL
# now improve locally (typically only by a small bit):res2 <-optim(res$par, f_wild, method ="BFGS")str(res2)
List of 5
$ par : num -15.8
$ value : num 67.5
$ counts : Named int [1:2] 19 2
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence: int 0
$ message : NULL
points(res2$par, res2$value, pch =20, col ="red", cex =1.5)
constrOptim(): Minimize a function subject to linear inequality constraints using an adaptive barrier algorithm (it calls optim() under the hood).
# inequality constraints (ui %*% theta >= ci): x <= 0.9, y - x > 0.1str(constrOptim(c(.5, 0), f_banana, gr_banana, ui =rbind(c(-1,0), c(1,-1)), ci =c(-0.9,0.1)))
List of 7
$ par : num [1:2] 0.889 0.789
$ value : num 0.0125
$ counts : Named num [1:2] 254 48
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence : int 0
$ message : NULL
$ outer.iterations: int 4
$ barrier.value : num -7.4e-05
nlm(): This function carries out a minimization of the objective function using a Newton-type algorithm.
f <-function(x) sum((x -1:length(x))^2)res <-nlm(f, c(10,10))str(res)
List of 5
$ minimum : num 4.3e-26
$ estimate : num [1:2] 1 2
$ gradient : num [1:2] 2.76e-13 -3.10e-13
$ code : int 1
$ iterations: int 2
res <-nlm(f, c(10,10), print.level =2)
iteration = 0
Step:
[1] 0 0
Parameter:
[1] 10 10
Function Value
[1] 145
Gradient:
[1] 18.00001 16.00001
iteration = 1
Step:
[1] -9 -8
Parameter:
[1] 1 2
Function Value
[1] 1.721748e-13
Gradient:
[1] 1.551336e-06 1.379735e-06
iteration = 2
Parameter:
[1] 1 2
Function Value
[1] 4.303458e-26
Gradient:
[1] 2.757794e-13 -3.099743e-13
Relative gradient close to zero.
Current iterate is probably solution.
str(res)
List of 5
$ minimum : num 4.3e-26
$ estimate : num [1:2] 1 2
$ gradient : num [1:2] 2.76e-13 -3.10e-13
$ code : int 1
$ iterations: int 2
nlminb(): Unconstrained and box-constrained optimization using PORT routines.
res <-nlminb(c(-1.2, 1), f_banana)str(res)
List of 6
$ par : num [1:2] 1 1
$ objective : num 1.23e-20
$ convergence: int 0
$ iterations : int 35
$ evaluations: Named int [1:2] 45 74
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ message : chr "X-convergence (3)"
List of 6
$ par : num [1:2] 1 1
$ objective : num 4.29e-22
$ convergence: int 0
$ iterations : int 35
$ evaluations: Named int [1:2] 43 36
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ message : chr "X-convergence (3)"
Package optimr
The author of the base function optim() (which he claims is obsolete now) has also created the package optimr as a convenient wrapper to many other solvers so that they can be easily used and compared:
library(optimr) # install.packages("optimr")# optimr() can be called like optim() but has more methods: "Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "nlm", "nlminb", "Rcgmin", "Rvmmin", and "hjn"res <-optimr(c(-1.2, 1), f_banana, method ="Rvmmin")str(res)
List of 5
$ par : num [1:2] -0.855 1.141
$ value : num 20.2
$ counts : Named num [1:2] 30 2
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence: num 3
$ message : chr "Rvmminu appears to have converged"
res <-optimr(c(-1.2, 1), f_banana, gr_banana, method ="Rvmmin")str(res)
List of 5
$ par : num [1:2] 1 1
$ value : num 1.23e-32
$ counts : Named num [1:2] 59 39
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence: num 2
$ message : chr "Rvmminu appears to have converged"
# opm() can use several methods at the same timeres <-opm(c(-1.2, 1), f_banana, method =c("Nelder-Mead", "BFGS"))print(res)
The package optimrx is similar to optimr but accepts more methods. It is not available in CRAN, instead check R-Forge and to install do: install.packages("optimrx", repos="http://R-Forge.R-project.org").
Package gloptim
Package gloptim is a wrapper to many global optimization packages (similar in spirit to what optimr is lo local optimization). Note that global optimization is totally different in philosophy to local optimization (global optimization solvers are usually called stochastic solvers and attempt to escape local optimum points). To install, do: install.packages("gloptim", repos="http://R-Forge.R-project.org").
Solvers for specific classes of problems
If the problem to be solved falls within a particular class of problems, such as LS, LP, MILP, QP, SOCP, or SDP, then it is much better to use a solver specific for that class.
Least Squares (LS)
A linear least-squares (LS) problem minimizes the \(\ell_2\)-norm \(\|\mathbf{A}\mathbf{x}-\mathbf{b}\|_2\) possibly with bounds or linear constraints. The function qr.solve(A, b) from base R solves over- and underdetermined linear systems in the least-squares sense. The package colf performs least squares constrained optimization on a linear objective function and contains a number of algorithms to choose from.
Linear Programming (LP)
The package linprog contains the function solveLP() to conveniently solve LPs of the form: \[
\begin{array}{ll}
\underset{\mathbf{x}\in \mathbf{R}^n}{\textsf{minimize}} & \mathbf{c}^T\mathbf{x}\\
{\textsf{subject to}} & \begin{array}[t]{l} \mathbf{A}\mathbf{x} \le \mathbf{b}\\
\mathbf{x} \ge \mathbf{0}\end{array}\\
\end{array}
\]
library(linprog) # install.packages("linprog")
Loading required package: lpSolve
# example of Steinhauser, Langbehn and Peters (1992)cvec <-c(1800, 600, 600) # gross marginsnames(cvec) <-c("Cows", "Bulls", "Pigs")bvec <-c(40, 90, 2500) # endowmentnames(bvec) <-c("Land", "Stable", "Labor")Amat <-rbind(c(0.7, 0.35, 0),c( 1.5, 1, 3),c( 50, 12.5, 20))# run solverres <-solveLP(cvec, bvec, Amat, maximum =TRUE)print(res)
Results of Linear Programming / Linear Optimization
Objective function (Maximum): 93600
Iterations in phase 1: 0
Iterations in phase 2: 2
Solution
opt
Cows 44
Bulls 24
Pigs 0
Basic Variables
opt
Cows 44.0
Bulls 24.0
S Land 0.8
Constraints
actual dir bvec free dual dual.reg
Land 39.2 <= 40 0.8 0.0 0.8
Stable 90.0 <= 90 0.0 240.0 15.0
Labor 2500.0 <= 2500 0.0 28.8 1375.0
All Variables (including slack variables)
opt cvec min.c max.c marg marg.reg
Cows 44.0 1800 900 2400.000 NA NA
Bulls 24.0 600 450 1200.000 NA NA
Pigs 0.0 600 -Inf 1296.000 -696.0 6.25
S Land 0.8 0 NA 731.092 0.0 NA
S Stable 0.0 0 -Inf 240.000 -240.0 15.00
S Labor 0.0 0 -Inf 28.800 -28.8 1375.00
Mixed Integer Linear Programming (MILP)
The package lpSolve (much faster than linprog because it is coded in C) solves linear mixed integer problems (i.e., LPs possibly with some integer constraints):
# run again, this time requiring that all three variables be integerres_int <-lp("max", f.obj, f.con, f.dir, f.rhs, int.vec =1:3)print(res_int)
Success: the objective function is 37
res_int$solution
[1] 1 4 0
Another good package is Rglpk, which is an R interface to the GNU Linear Programming Kit. ‘GLPK’ is open source software for solving large-scale LP, MILP, and other related problems.
Quadratic Programming (QP)
The package quadprog contains the function solve.QP() to conveniently solve QPs of the form: \[
\begin{array}{ll}
\underset{\mathbf{x}\in \mathbf{R}^n}{\textsf{minimize}} & -\mathbf{d}^T\mathbf{x} + \frac{1}{2}\mathbf{x}^T\mathbf{D}\mathbf{x}\\
{\textsf{subject to}} & \mathbf{A}^T\mathbf{x} =/\ge \mathbf{b}\\
\end{array}
\]
library(quadprog) # install.packages("quadprog")# Set up problem: # minimize -(0 5 0) %*% x + 1/2 x^T x# subject to A^T x >= b# with b = (-8,2,0)^T# (-4 2 0)# A = (-3 1 -2)# ( 0 0 1)Dmat <-diag(3)dvec <-c(0, 5, 0)Amat <-matrix(c(-4, -3, 0, 2, 1, 0, 0, -2, 1), 3, 3)bvec <-c(-8, 2, 0)# run solverres <-solve.QP(Dmat, dvec, Amat, bvec)str(res)
List of 6
$ solution : num [1:3] 0.476 1.048 2.095
$ value : num -2.38
$ unconstrained.solution: num [1:3] 0 5 0
$ iterations : int [1:2] 3 0
$ Lagrangian : num [1:3] 0 0.238 2.095
$ iact : int [1:2] 3 2
The package quadprogXT extends the quadprog to solve quadratic programs with absolute value constraints and absolute values in the objective function.
Second-Order Cone Programming (SOCP)
There are several packages:
Package ECOSolveR provides an interface to the Embedded COnic Solver (ECOS), a well-known, efficient, and robust C library for convex problems. Conic and equality constraints can be specified in addition to integer and boolean variable constraints for mixed-integer problems.
The CLSOCP package provides an implementation of a one-step smoothing Newton method for the solution of SOCP problems.
Cone Programming and Semi-Definite Programming (SDP)
There are several good packages:
Package scs applies operator splitting to solve linear programs, cone programs (SOCP), and semidefinite programs; cones can be second-order, exponential, power cones, or any combination of these.
Package sdpt3r solves general semidefinite programs using an R implementation of SDPT3, a MATLAB software for semidefinite quadratic-linear programming.
Package Rcsdp is an interface to the CSDP semidefinite programming library of routines that implement a primal-dual barrier method for solving semidefinite programming problems.
Package Rmosek provides an interface to the (commercial) MOSEK optimization library for large-scale LP, QP, and MIP problems, with emphasis on (nonlinear) conic, semidefinite, and convex tasks; academic licenses are available.
Optimization infrastructure packages
We have already seen two packages that are wrappers to many other solvers, namely: optimr for local optimization and gloptim for global optimization.
It is also worth mentioning the NEOS Server for optimization that provides online access to state-of-the-art optimization problem solvers. It is a free internet-based service for solving numerical optimization problems and provides access to more than 60 state-of-the-art solvers. The R package rneos enables the user to pass optimization problems to NEOS and retrieve results within R.
The following two packages are extremely useful and powerful: ROI and CVXR.
ROI for Convex Problems, MIP, and Nonconvex Problems
The ROI package for R Optimization Infrastructure provides a framework for handling optimization problems in R. It uses an object-oriented approach to define and solve various optimization tasks in R which can be from different problem classes (e.g., linear, quadratic, non-linear programming problems). This makes optimization transparent for the R user as the corresponding workflow is completely abstracted from the underlying solver. The approach allows for easy switching between solvers and thus enhances comparability. See ROI paper for details. Let’s see now several examples.
ROI Optimization Problem:
Maximize a linear objective function of length 3 with
- 3 continuous objective variables,
subject to
- 3 constraints of type linear.
- 1 lower and 1 upper non-standard variable bound.
# let's look at available solversROI_available_solvers(prob)$Package # solvers available for the particular problem on the internet
ROI_applicable_solvers(prob) # solvers installed and loaded in system applicable to the particular problem
[1] "ecos" "glpk" "lpsolve" "scs"
# solve itres <-ROI_solve(prob)res
Optimal solution found.
The objective value is: 8.670149e+01
solution(res) # also: res$solution
[1] 0.000000 9.238806 -1.835821
solution(res, "objval") # also: res$objval
[1] 86.70149
solution(res, "status")
$code
[1] 0
$msg
solver glpk
code 5
symbol GLP_OPT
message Solution is optimal.
roi_code 0
MILP – Consider the previous LP and make it into an MILP by adding the constraints \(x_2,x_3 \in \mathbb{Z}\).
# just modify the previous problem:types(prob) <-c("C", "I", "I")prob
ROI Optimization Problem:
Maximize a linear objective function of length 3 with
- 1 continuous objective variable,
- 2 integer objective variables,
subject to
- 3 constraints of type linear.
- 1 lower and 1 upper non-standard variable bound.
ROI Optimization Problem:
Minimize a linear objective function of length 5 with
- 5 binary objective variables,
subject to
- 3 constraints of type linear.
- 0 lower and 0 upper non-standard variable bounds.
ROI_applicable_solvers(prob)
[1] "ecos" "glpk" "lpsolve"
res <-ROI_solve(prob)res
Optimal solution found.
The objective value is: -1.010000e+02
SOCP – Consider the SOCP: \[
\begin{array}{ll}
\underset{\mathbf{x}}{\textsf{maximize}} & x + y + z\\
\textsf{subject to} & \begin{array}[t]{l} \sqrt{x^2 + z^2} \le \sqrt{2}\\ x,y,z \ge 0, \quad y \le 5\end{array}
\end{array}
\] and note that the SOC constraint \(\sqrt{x^2 + z^2} \le \sqrt{2}\) can be written as \(\|(x,z)\|_2 \le \sqrt{2}\) or \((\sqrt{2}, (x, z)) \in \mathcal{K}_\textsf{soc(3)}\) which is achieved in the code as: \(\textsf{rhs} - \mathbf{L}\mathbf{x} = \left[\begin{array}{c} \sqrt{2}\\ x\\ z\end{array} \right] \in \mathcal{K}_\textsf{soc(3)}\).
ROI Optimization Problem:
Maximize a linear objective function of length 3 with
- 3 continuous objective variables,
subject to
- 3 constraints of type conic.
|- 3 conic constraints of type 'soc'
- 0 lower and 1 upper non-standard variable bound.
ROI_applicable_solvers(prob)
[1] "ecos" "scs"
ROI_solve(prob)
Optimal solution found.
The objective value is: 7.000000e+00
SDP – Consider the SDP: \[
\begin{array}{ll}
\underset{\mathbf{x}}{\textsf{minimize}} & x_1 + x_2 - x_3\\
\textsf{subject to} &
\begin{array}[t]{l}
x_1\left(\begin{array}{cc} 10 & 3\\ 3 & 10\end{array} \right) +
x_2\left(\begin{array}{cc} 6 & -4\\ -4 & 10\end{array} \right) +
x_3\left(\begin{array}{cc} 8 & 1\\ 1 & 6\end{array} \right) \preceq
\left(\begin{array}{cc} 16 &-13\\-13 & 60\end{array} \right)\\
x_1,x_2,x_3 \ge 0\end{array}
\end{array}
\] and note that the SDP constraint \(\sum_{i=1}^m x_i\mathbf{F}_i \preceq \mathbf{F}_0\) can be written as \(\mathbf{F}_0 - \sum_{i=1}^m x_i\mathbf{F}_i \in \mathcal{K}_\textsf{sdp(3)}\) (size 3 is because in our problem the matrices are \(2\times2\) but vech() extracts 3 independent variables since the matrices are symmetric).
ROI Optimization Problem:
Minimize a linear objective function of length 3 with
- 3 continuous objective variables,
subject to
- 3 constraints of type conic.
|- 3 conic constraints of type 'psd'
- 0 lower and 0 upper non-standard variable bounds.
# model the whole problem as a NLPprob <-OP(objective =F_objective(F =function(x) prod(x), n =3, G =function(x) c(prod(x[-1]), prod(x[-2]), prod(x[-3]))),constraints =F_constraint(F =function(x) x[1] +2* x[2] +2* x[3], dir ="<=", rhs =72,J =function(x) c(1, 2, 2)),bounds =V_bound(ud =42, nobj =3L),maximum =TRUE)prob#> ROI Optimization Problem:#> #> Maximize a nonlinear objective function of length 3 with#> - 3 continuous objective variables,#> #> subject to#> - 1 constraint of type nonlinear.#> - 0 lower and 3 upper non-standard variable bounds.ROI_solve(prob, solver ="alabama", start =c(10, 10, 10))#> Optimal solution found.#> The objective value is: 3.456000e+03# alternatively, the constraint can be specified as linearconstraints(prob) <-L_constraint(L =c(1, 2, 3), "<=", 72)prob#> ROI Optimization Problem:#> #> Maximize a nonlinear objective function of length 3 with#> - 3 continuous objective variables,#> #> subject to#> - 1 constraint of type linear.#> - 0 lower and 3 upper non-standard variable bounds.
CVXR for Convex Problems
CVXR is an R package that provides an object-oriented modeling language for convex optimization, similar to CVX, CVXPY, YALMIP, and Convex.jl. It allows the user to formulate convex optimization problems in a natural mathematical syntax rather than the restrictive standard form required by most solvers. The user specifies an objective and set of constraints by combining constants, variables, and parameters using a library of functions with known mathematical properties. CVXR then applies signed disciplined convex programming (DCP) to verify the problem’s convexity. Once verified, the problem is converted into standard conic form using graph implementations and passed to a cone solver such as ECOS or SCS. See CVX manual for details. Let’s see now several examples.
Least squares – Let’s start with a simple LS example: \[
\begin{array}{ll}
\underset{\boldsymbol{\beta}}{\textsf{minimize}} & \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|_2^2\\
\end{array}
\] We can of course the R base linear model fitting function lm():
# generate datam <-100n <-10beta_true <-c(-4:5)X <-matrix(rnorm(m*n), nrow = m)y <- X %*% beta_true +rnorm(m)# estimate xres <-lm(y ~0+ X) # 0 means there is no intercept in our modelres
The amount of available solvers in R is overwhelming and it may be difficult to navigate. The following steps are suggested:
If the problem is convex, then start with CVXR for initial tests.
If the speed is not fast enough, then move on to the next level and use ROI at the expense of more effort to use.
If faster speed is still required, then if the problem belongs to one of the well defined classes, such as LP, MILP, QP, SOCP, or SDP, use a solver specific for that class (for example, for LPs I recommend lpSolve and for QPs quadprog).
However, if the problem does not fit any class, then one has to use a general solver for nonlinear optimization. In that sense, if a local solution is enough, then a good starting point is either ROI or the package optimr, which is a wrapper for many solvers. If a global solver is required, then the package gloptim is a good choice, which is a wrapper for many global solvers.