The automatic differentiation packages in Julia are quite sophisticated. They implement most of the standard chain rules in these documents by manipulating the syntax of the Julia function.
This means you should keep your Julia function syntax as clean and simple as possible.
The package you want is Zygote
and the functions are: gradient
, jacobian
, hessian
using Pkg; Pkg.add("Zygote")
Updating registry at `~/.julia/registries/General` Updating git-repo `https://github.com/JuliaRegistries/General.git` Resolving package versions... Installed ZygoteRules ─── v0.2.3 Installed LLVMExtra_jll ─ v0.0.18+0 Installed IRTools ─────── v0.4.9 Installed GPUArrays ───── v8.6.2 Installed Zygote ──────── v0.6.55 Installed ChainRules ──── v1.48.0 Installed LLVM ────────── v4.17.1 Updating `~/.julia/environments/v1.8/Project.toml` ⌃ [e88e6eb3] + Zygote v0.6.55 Updating `~/.julia/environments/v1.8/Manifest.toml` [082447d4] + ChainRules v1.48.0 ⌃ [0c68f7d7] + GPUArrays v8.6.2 [7869d1d1] + IRTools v0.4.9 ⌅ [929cbde3] + LLVM v4.17.1 ⌃ [e88e6eb3] + Zygote v0.6.55 [700de1a5] + ZygoteRules v0.2.3 ⌅ [dad2f222] + LLVMExtra_jll v0.0.18+0 Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m` Precompiling project... ✓ LLVMExtra_jll ✓ ZygoteRules ✓ IRTools ✓ LLVM ✓ ChainRules ✓ GPUArrays ✓ Zygote 7 dependencies successfully precompiled in 8 seconds. 411 already precompiled. 5 skipped during auto due to previous errors.
ChatGPT wrote the following code for me in response to a prompt about using Julia to do an automatic gradient.
using Zygote
function f(alpha, sigma)
sum(exp.(-(alpha.^2) ./ sigma))
end
# Compute the gradient of f with respect to alpha and sigma
gradient_f(alpha, sigma) = gradient(f, alpha, sigma)
# Test the gradient_f function
alpha_test = 1.0
sigma_test = 2.0
grad_alpha, grad_sigma = gradient_f(alpha_test, sigma_test)
println("Gradient with respect to alpha: ", grad_alpha)
println("Gradient with respect to sigma: ", grad_sigma)
Gradient with respect to alpha: -0.6065306597126334 Gradient with respect to sigma: 0.15163266492815836
#alpha_test = [1.0,2.0]
#sigma_test = 2.0
using LinearAlgebra
"""
gradient_check(f, x, g)
Check that g seems to express the gradient of f by comparing
values computed with finite differences. This is based on the
gradientcheck function in the Poblano Matlab toolbox.
"""
function gradient_check(f, x, g)
fx = f(x) # computed function
gx = g(x) # putative gradient
h = sqrt(eps(fx))
xi = copy(x)
gxd = copy(gx)
for i=1:length(x)
xi[i] += h
gxd[i] = (f(xi) - fx)/h
xi[i] = x[i] # reset
end
absdiff = abs.(gxd .- gx)
return (g=gx, gfd=gxd, maxdiff=maximum(absdiff), normdiff=norm(gxd - gx))
end
gradient_check(x -> x'*x, randn(10), x -> 2*x).maxdiff
4.834157171784703e-8
sigmas = rand(10)
gradient_check(x -> f(x, sigmas), randn(10),
x -> gradient_f(x, sigmas)[1]).maxdiff
3.457646324811492e-8
using LinearAlgebra
function logdet(X)
log(det(X))
end
gradlogdet(X) = gradient(logdet, X)
gradlogdet(triu(rand(10,10)))
([2.1057491259147616 0.0 … 0.0 0.0; -2.5898781979204553 3.2755421256828017 … 0.0 0.0; … ; -5.759816203889525 -6.585190075047711 … 1.1309285937416926 0.0; -75.85726304504391 -22.085631996846725 … -0.8185157612955057 5.941944495909822],)
gradient_check(x->logdet(x), triu(rand(10,10)),
x->gradlogdet(x)[1]).maxdiff
0.023732792065629837
gradlogdet(triu(rand(10,10)))
([34.20790713299038 0.0 … 0.0 0.0; -40.20845320360208 1.5148165279378571 … 0.0 0.0; … ; -1228.1127147326963 18.926424259244474 … 10.389273104484472 0.0; 1053.9350539173474 -15.118246149257264 … -10.187877221169702 1.2286150883519777],)
using LinearAlgebra, SparseArrays
function pagerank(a,P,v)
(I-a*P)\((1-a)*v)
end
n = 5
P = Matrix(sparse(1:n, mod.(1:n, n) .+ 1, 1.0, n, n)) # make a cycle matrix
P[1,2] = 0.5
P[1,3] = 0.5
P = P'
prgrad(a) = gradient(pagerank, a, P, ones(n)/5)
prgrad (generic function with 1 method)
gradient_check(x->pagerank(x,P,ones(n)/5), 0.85,
x->prgrad(x)[1]).maxdiff
Output is an array, so the gradient is not defined. Perhaps you wanted jacobian. Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] sensitivity(y::Vector{Float64}) @ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:67 [3] gradient(::Function, ::Float64, ::Vararg{Any}) @ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:97 [4] prgrad(a::Float64) @ Main ./In[14]:10 [5] (::var"#28#30")(x::Float64) @ Main ./In[15]:2 [6] gradient_check(f::var"#27#29", x::Float64, g::var"#28#30") @ Main ./In[3]:13 [7] top-level scope @ In[15]:1
using LinearAlgebra, SparseArrays
function pagerank(a,P,v)
(I-a*P)\((1-a)*v)
end
n = 5
P = Matrix(sparse(1:n, mod.(1:n, n) .+ 1, 1.0, n, n)) # make a cycle matrix
P[1,2] = 0.5
P[1,3] = 0.5
P = P'
prgrad(a) = jacobian(pagerank, a, P, ones(n)/5)
prgrad (generic function with 1 method)
[(pagerank(0.75+sqrt(eps(1.0)),P,ones(n)/5) -
pagerank(0.75,P,ones(n)/5))/sqrt(eps(1.0)) prgrad(0.75)[1]]
5×2 Matrix{Float64}: 0.0393128 0.0393128 -0.0797884 -0.0797884 -0.0105278 -0.0105278 0.0180326 0.0180326 0.0329708 0.0329708
function f2(x)
n = length(x) ÷ 2
alpha = x[1:n]
sigma = x[n+1:end]
f(alpha, sigma)
end
hessian_f(alpha,sigma) = hessian(f, [alpha, sigma])
hessian_f(alpha_test, sigma_test)
MethodError: no method matching f(::Vector{ForwardDiff.Dual{Nothing, Float64, 2}})
Closest candidates are:
f(::Any, ::Any) at In[5]:3
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/Zygote/g2w9o/src/compiler/interface2.jl:0 [inlined]
[2] _pullback(ctx::Zygote.Context{false}, f::typeof(f), args::Vector{ForwardDiff.Dual{Nothing, Float64, 2}})
@ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/interface2.jl:9
[3] pullback(f::Function, cx::Zygote.Context{false}, args::Vector{ForwardDiff.Dual{Nothing, Float64, 2}})
@ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:44
[4] pullback
@ ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:42 [inlined]
[5] gradient(f::Function, args::Vector{ForwardDiff.Dual{Nothing, Float64, 2}})
@ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:96
[6] (::Zygote.var"#114#115"{typeof(f)})(x::Vector{ForwardDiff.Dual{Nothing, Float64, 2}})
@ Zygote ~/.julia/packages/Zygote/g2w9o/src/lib/grad.jl:64
[7] forward_jacobian(f::Zygote.var"#114#115"{typeof(f)}, x::Vector{Float64}, #unused#::Val{2})
@ Zygote ~/.julia/packages/Zygote/g2w9o/src/lib/forward.jl:29
[8] forward_jacobian(f::Function, x::Vector{Float64}; chunk_threshold::Int64)
@ Zygote ~/.julia/packages/Zygote/g2w9o/src/lib/forward.jl:44
[9] forward_jacobian
@ ~/.julia/packages/Zygote/g2w9o/src/lib/forward.jl:42 [inlined]
[10] hessian_dual
@ ~/.julia/packages/Zygote/g2w9o/src/lib/grad.jl:64 [inlined]
[11] hessian
@ ~/.julia/packages/Zygote/g2w9o/src/lib/grad.jl:62 [inlined]
[12] hessian_f(alpha::Float64, sigma::Float64)
@ Main ./In[30]:7
[13] top-level scope
@ In[30]:8
hessian_f(alpha_test, sigma_test)
function simplef(alphas)
sum(exp.(-(alphas.^2) ./ sigmas))
end
sigmas = rand(10)
hessian_f(alphas) = hessian(simplef, alphas)
alphas = randn(10)
hessian_f(alphas)
10×10 Matrix{Float64}: 1.7684 -0.0 0.0 -0.0 … -0.0 -0.0 0.0 -0.0 -6.4505 0.0 -0.0 -0.0 -0.0 0.0 -0.0 -0.0 2.18349 -0.0 -0.0 -0.0 0.0 -0.0 -0.0 0.0 0.62219 -0.0 -0.0 0.0 -0.0 -0.0 0.0 -0.0 -0.0 -0.0 0.0 -0.0 -0.0 0.0 -0.0 … -0.0 -0.0 0.0 -0.0 -0.0 0.0 -0.0 -0.0 -0.0 0.0 -0.0 -0.0 0.0 -0.0 -2.40502 -0.0 0.0 -0.0 -0.0 0.0 -0.0 -0.0 0.102418 0.0 -0.0 -0.0 0.0 -0.0 -0.0 -0.0 1.94047