For Developers
Developers may fork the GTPSA.jl repo and then dev
their forked repo in the REPL. For example, if my Github username is githubuser
, then after forking GTPSA.jl I would run in the REPL:
import Pkg
Pkg.develop(url="https://github.com/githubuser/GTPSA.jl")
The package consists of two layers: a low-level layer written in Julia that is 1-to-1 with the GTPSA C code, and a high-level, user-friendly layer that cleans up the notation for manipulating TPSs, manages temporaries generated during evaluation, and properly manages the memory in C when variables go out of scope in Julia. The low-level functions are listed at the bottom of this page.
When it comes to managing memory in C via Julia, there are certain intricacies that have to be considered. First, let's consider the Descriptor
, which is the simplest:
Descriptor
The Descriptor
stores all information about the GTPSA, including the indexing table for indexing specific monomials. This is a static object, only created once for a GTPSA, which all TPS
s refer to. In C, these structs must exist for the entirety of the program execution. In Julia, because they must not be destroyed when out-of-scope, we wrap these objects in immutable structs. In a single program, up to 250 Descriptor
s can exist simultanouesly, and they can be manually optionally destroyed using GTPSA.mad_desc_del!
. At program termination all Descriptor
s are destroyed. Given the significantly large number allowed, as well as the danger of still-existing TPS
and Descriptor
structs after destruction, no front-facing interface to the user is given for destroying existing Descriptor
s.
The Descriptor
struct simply wraps a C-pointer to a low-level struct called Desc
: this struct is 1-to-1 equivalent to the C struct desc
in GTPSA. See the documentation for GTPSA.Desc
below. By having this struct in Julia, we can unsafe_load
the struct and get values in the desc
. For example, to access the first TPS{Float64}
in the buffer of temporaries in the Descriptor
, we can do
julia> using GTPSA
julia> import GTPSA: Desc
julia> d = Descriptor(5,8)
GTPSA Descriptor ----------------------- # Variables: 5 Maximum order: 8
julia> desc = unsafe_load(d.desc) # To access the low-level C struct
GTPSA.Desc(0, 5, 5, 0, 0x08, 0x00, 0xff, Ptr{UInt8} @0x0000000002403558, 0, 4, 0x00000507, 0x00000000, 0x00000000, Ptr{Int32} @0x0000000002699a68, Ptr{UInt8} @0x00000000028f2498, Ptr{UInt8} @0x0000000001a614e8, Ptr{UInt8} @0x0000000002445218, Ptr{Ptr{UInt8}} @0x0000000002c46928, Ptr{Ptr{UInt8}} @0x0000000002bb9f38, Ptr{Ptr{UInt8}} @0x0000000001d35ac8, Ptr{Int32} @0x0000000002289df8, Ptr{Int32} @0x000000000273ee18, Ptr{Int32} @0x0000000001cdcbc8, Ptr{Int32} @0x000000000281b798, Ptr{Ptr{Int32}} @0x000000000238b248, Ptr{Ptr{Ptr{Int32}}} @0x0000000001cec068, 0x0000000000022195, 0.0, 0.0, 0.0, Ptr{Ptr{Nothing}} @0x00000000022c4ac8, Ptr{Ptr{Nothing}} @0x0000000002bbd4e8, Ptr{Int32} @0x00000000028a1668, Ptr{Int32} @0x00000000021e1c08)
julia> t_jl = unsafe_load(Base.unsafe_convert(Ptr{Ptr{TPS{Float64}}}, desc.t), 1) # 1 in Julia = 0 in C
Ptr{TPS64} @0x00000000021f2028
In Julia, if we were to then unsafe_load(t_jl)
, there would in fact be allocations, as its internally creating a copy of the TPS{Float64}
in Julia. This is not well documented in the documentation of unsafe_load
(see the discussion here).
TPS
The TPS{Float64}
struct in Julia corresponds exactly to the C struct tpsa
and TPS{ComplexF64}
struct in Julia corresponds exactly to ctpsa
in C. To understand fully the TPS
struct, some history is needed:
In early development versions of GTPSA.jl
, the TPS
struct was very similar to the Descriptor
struct: it used to just wrap a C Ptr
to a low-level struct, and instead be mutable
so out-of-scope TPS
s and temporaries are cleaned up. This at the time seemed like the simplest solution, since in Julia there is no way to tag C pointers for Julia garbage collection. This created some problems though: firstly, there is one indirection because Julia would tag that particular mutable wrapper struct for GC, but then the member Ptr
of that struct pointed to a different place in memory that had to be cleaned up. Secondly, when calling GTPSA C functions that want a Vector
of TPS
(e.g. Ptr
to TPS
), you would need to do a map(t->t.tpsa, x)
where x
is a Vector{TPS64}
and tpsa
is the field member of the wrapper TPS struct. Thirdly, and perhaps most significantly, Julia is not aware of how much memory the C is using. Therefore, it will not call the garbage collector very often, even if actually >100GB of memory is being used. Sometimes the OS will just kill the Julia process.
In order to get around these problems, we must allocate the entire mutable TPS
struct in Julia instead of in the C code (e.g. using GTPSA.mad_tpsa_newd
). We then can use pointer_from_objref
in Julia to get a pointer to that mutable, Julia-owned struct to pass to the C functions.
Sounds simple enough, right? If only! In the GTPSA C code, the coef
member array is something called a flexible array member. This is great in C, because instead of the struct storing a pointer to an array (which would cause an indirection every time the coef
array is accessed in a TPS
), it actually stores the array right there in the struct, with variable size. This gives some performance gains. In Julia, there is no such analog. For those who know about StaticArrays.jl
, you might think an SVector
could work, but surprise, it doesn't because you cannot mutate the fields of an SVector
and neither can the C code.
So the only solution it seems is to change the actual C struct in mad_tpsa_impl.h
and mad_ctpsa_impl.h
to use regular arrays for coef
instead of flexible array members, and indeed this is what is done for GTPSA.jl
. There is a tiny speed reduction due to the indirection of accessing coef
, however the benefits of Julia's garbage collector knowing how much memory it's using, and keeping memory usage sane, is worth the very tiny cost.
On the Julia side, it turns out you cannot just use a regular Vector
for the coef
array in the TPS
struct, because Julia's Vector
structs are quite complex and play a lot of tricks. You might actually be able to use an MVector
from StaticArrays
, however using this struct might make GTPSA.jl
significantly more complex because the size of the array has to be known at compile-time or else you suffer the drastic performance reductions caused by type-instabilities. The complexity of using this could be checked at some point in the future.
The decided solution was to, in the Julia, @ccall jl_malloc
for the coef
array, and in the finalizer for the mutable TPS
struct call @ccall jl_free
for the coef
array. This gives us C-style arrays that Julia's garbage collector knows about, and so will make sure to keep the memory usage sane.
When @ccall
is used, the arguments are Base.unsafe_convert
ed to the corresponding specified argument types. Therefore, for TPS
all we had to do then was define
Base.unsafe_convert(::Type{Ptr{TPS{T}}}, t::TPS{T}) where {T} = Base.unsafe_convert(Ptr{TPS{T}},pointer_from_objref(t))
and now we can pass our TPS
structs to C using @ccall
.
TempTPS
Because unsafe_load
of a Ptr{<:TPS}
creates a copy and allocates, we cannot treat the constant buffer of pre-allocated temporaries in the Descriptor
as bona-fide TPS
s. Note that the memory addresses of the temporaries in the buffer are constant and do not need to be cleaned up; they are immutable!. The temporaries, which we give the type GTPSA.TempTPS
, do not have any of the problems of just wrapping a pointer as do the TPS
s, and so that's what they are. Also in Julia, in order to access the fields of a TempTPS
(e.g. mo
) via unsafe_load
without allocating, we need an immutable struct having the same structure as TPS
. This is the purpose of GTPSA.LowTempTPS
. We need to use the mo
field for @FastGTPSA
to finally allocate the result TPS
with the mo
of the result TempTPS
.
As with TPS
, we also had to define Base.unsafe_convert
for TempTPS
so we can @ccall
. In this case, the unsafe_convert
returns the member Ptr
of the TempTPS
.
GTPSA.TempTPS
— Typestruct TempTPS{T<:Union{Float64,ComplexF64}}
This is for internal use only. TempTPS
is a temporary TPS
, which has been pre-allocated in a buffer for each thread in the Descriptor
C struct. When using the @FastGTPSA
macro, all temporaries generated will be used from this buffer. "Constructors" of this type simply take a temporary from that particular thread's buffer in a stack-like manner and "Destructors" (which must be manually called because this is immutable) release it from the stack.
Fields
t::Ptr{TPS{T}}
– Pointer to theTPS
in the buffer in theDescriptor
Library Structure
All operators have an in-place, mutating version specified with a bang (!). These are the lowest-level pure Julia code, following the convention that the first argument is the one to contain the result. In the GTPSA C library, the last argument contains the result, so this is accounted for in the file inplace_operators.jl
. All in-place functions can receive either a regular TPS
, which the user will be using, as well as a GTPSA.TempTPS
, which the user should not concern themselves with. The constants RealTPS
and ComplexTPS
are defined respectively in low_level/rtpsa.jl
and low_level/ctpsa.jl
to simplify the notation. These are just:
# Internal constants to aid multiple dispatch including temporaries
const RealTPS = Union{TempTPS{Float64}, TPS{Float64}}
const ComplexTPS = Union{TempTPS{ComplexF64}, TPS{ComplexF64}}
All this does is enforce correct types for the in-place functions, while keeping the notation/code simple.
The in-place, mutating functions, defined in inplace_operators.jl
must all use the RealTPS
and ComplexTPS
"types". Then, the higher level out-of-place functions for both TPS
and TempTPS
, which do different things with the result, will use these in-place functions.
The out-of-place functions for TPS
are defined in operators.jl
, and the out-of-place functions for TempTPS
are defined in fastgtpsa/operators.jl
.
Fast GTPSA Macros
The @FastGTPSA
/@FastGTPSA!
macros work by changes all arithmetic operators in different Julia arithmetic operators with the same operator precedence and unary operator capabilities. These special operators then dispatch on functions that use the temporaries when a TPS
or TempTPS
is passed, else default to their original operators, thereby making it completely transparent to non-TPS types. Both +
and -
must work as unary operators, and there is a very short list of allowed ones shown here. The other arithmetic operators were chosen somewhat randomly from the top of the same file, next to prec-plus
, prec-times
, and prec-power
which defines the operator precedences. By taking this approach, we relieve ourselves of having to rewrite PEMDAS and instead let the Julia do it for us.
All arithmetic operators are changed to GTPSA.:<special symbols>
, e.g. +
→ GTPSA.:±
. All non-arithmetic operators that are supported by GTPSA are then changed to GTPSA.__t_<operator>
, e.g. sin
→ GTPSA.__t_sin
, where the prefix __t_
is also chosen somewhat arbitrarily. These operators are all defined in fastgtpsa/operators.jl
, and when they encounter a TPS type, they use the temporaries, and when other number types are detected, they fallback to the regular, non-__t_
operator. This approach works extraordinarily well, and introduces no problems externally because none of these functions/symbols are exported.
Calling the C-library with pointers and pointer-to-pointers
All of the GTPSA map functions require a vector of TPS
as input, in C **tpsa
. In Julia, this works automatically for AbstractArray{<:Union{TPS64,ComplexTPS64}}
by specifying the C argument type in the C call as ::Ptr{TPS64}
or ::Ptr{ComplexTPS64}
. However, this can be misleading to those C inputs which require *tpsa
, e.g. also ::Ptr{TPS64}
. For consistency in the low level library interface, all pointers are specified as ::Ref{TPS64}
and ::Ref{ComplexTPS64}
, and pointers-to-pointers are specified as Ptr
s. For single TPS
s, the cconvert
to Ref
in the ccall
is handled automatically.
However, in some cases, one might have only a single TPS64
and would like the call the corresponding map functions without having to allocate an array. After some experimenting, I've found the following overrides to gives zero allocations:
Base.unsafe_convert(::Type{Ptr{TPS{T}}}, r::Base.RefValue{Ptr{Nothing}}) where {T} = Base.unsafe_convert(Ptr{TPS{T}}, Base.unsafe_convert(Ptr{Cvoid}, r))
Base.cconvert(::Type{Ptr{TPS{T}}}, t::TPS{T}) where {T} = Ref(pointer_from_objref(t))
NOTE: We need to have a GC.@preserve
before the cconvert to keep t
valid! As such, you will see in all map functions a call to GC.@preserve
.
This override is only necessary for the mutable TPS
. The other array inputs of isbits
types are ok and basically have the above defined for them. See this discussion for more details.
Low-Level
Below is documentation for every single 1-to-1 C function in the GTPSA library. If there is any function missing, please submit an issue to GTPSA.jl
.
Monomial
GTPSA.mad_mono_add!
— Functionmad_mono_add!(n::Cint, a, b, r)
Sets monomial r = a + b
.
Input
n
– Length of monomialsa
– Source monomiala
b
– Source monomialb
Output
r
– Destination monomial,r = a + b
GTPSA.mad_mono_cat!
— Functionmad_mono_cat!(n::Cint, a, m::Cint, b, r)
Sets monomial r
equal to the concatenation of the monomials a
and b
Input
n
– Length of monomoniala
a
– Source monomiala
m
– Length of monomialb
b
– Source monomialb
Output
r
– Destination monomial of concatenation ofa
andb
(lengthn+m
)
GTPSA.mad_mono_cmp
— Functionmad_mono_cmp(n::Cint, a, b)::Cint
Compares monomial a
to monomial b
, and returns the first difference in the lowest order variables.
Input
n
– Length of monomialsa
– Monomiala
b
– Monomialb
Output
ret
– Firsta[i]-b[i] != 0
GTPSA.mad_mono_copy!
— Functionmad_mono_copy!(n::Cint, a, r)
Copies monomial a
to monomial r
.
Input
n
– Length of monomialsa
– Source monomialr
– Destination monomial
GTPSA.mad_mono_eq
— Functionmad_mono_eq(n::Cint, a, b)::Bool
Checks if the monomial a
is equal to the monomial b
.
Input
n
– Length of monomialsa
– Monomiala
b
– Monomialb
Output
ret
– True if the monomials are equal, false if otherwise
GTPSA.mad_mono_eqn
— Functionmad_mono_eqn(n::Cint, a, b::Cuchar)::Bool
???
GTPSA.mad_mono_fill!
— Functionmad_mono_fill!(n::Cint, a, v::Cuchar)
Fills the monomial a
with the value v
.
Input
n
– Monomial lengtha
– Monomialv
– Value
GTPSA.mad_mono_le
— Functionmad_mono_le(n::Cint, a, b)::Bool
Checks if monomial a
is less than or equal to monomial b
.
Input
n
– Length of monomialsa
– Monomiala
b
– Monomialb
Output
ret
– True ifa <= mono_b
, false otherwise
GTPSA.mad_mono_lt
— Functionmad_mono_lt(n::Cint, a, b)::Bool
Checks if monomial a
is less than monomial b
.
Input
n
– Length of monomialsa
– Monomiala
b
– Monomialb
Output
ret
– True ifa < b
, false otherwise
GTPSA.mad_mono_max
— Functionmad_mono_max(n::Cint, a)::Cuchar
Returns the maximum order of the monomial.
Input
n
– Length of monomiala
– Monomial
Output
mo
– Maximum order of monomiala
GTPSA.mad_mono_min
— Functionmad_mono_min(n::Cint, a)::Cuchar
Returns the minimum order of the monomial.
Input
n
– Length of monomiala
– Monomial
Output
mo
– Mininum order of monomiala
GTPSA.mad_mono_ord
— Functionmad_mono_ord(n::Cint, a)::Cint
Returns the sum of the orders of the monomial a
.
Input
n
– Monomial lengtha
– Monomial
Output
s
– Sum of orders of monomial
GTPSA.mad_mono_ordp
— Functionmad_mono_ordp(n::Cint, a, stp::Cint)::Cdouble
Returns the product of each stp
-th order in monomial a
. For example, stp = 2
collects every order in the monomial with a step of 2 between each. As a
is a pointer, the product can be started at any element in the monomial.
Input
n
– Monomial lengtha
– Monomial as byte arraystp
– Step over which orders to include in the product
Output
p
– Product of orders of monomial separated bystp
.
GTPSA.mad_mono_ordpf
— Functionmad_mono_ordpf(n::Cint, a, stp::Cint)::Cdouble
Returns the product of factorials each stp
-th order in monomial a. For example, stp = 2
collects every order in the monomial with a step of 2 between each. As a
is a pointer, the product can be started at any element in the monomial.
Input
n
– Monomial lengtha
– Monomial as byte arraystp
– Step over which orders to include in the product of factorials
Output
p
– Product of factorials of orders of monomial separated bystp
GTPSA.mad_mono_print
— Functionmad_mono_print(n::Cint, a, sep_::Cstring, fp_::Ptr{Cvoid})
Prints the monomial to stdout
.
Input
n
– Length of monomiala
– Source monomial to print tostdout
sep_
– Separator stringfp_
– CFILE
pointer, if null will print tostdout
GTPSA.mad_mono_prt!
— Functionmad_mono_prt(n::Cint, a, s::Ptr{Cuchar})::Cstring
Writes the monomial defined by the byte array a
(with orders stored as hexadecimal) into a null terminated string s
.
Input
n
– Monomial and string lengtha
– Monomial as byte array
Output
ret
– Monomial as string
GTPSA.mad_mono_rcmp
— Functionmad_mono_rcmp(n::Cint, a, b)::Cint
Compares monomial a
to monomial b
starting from the right (when the monomials are ordered by variable, which is almost never the case) and returns the first difference in the lowest order variables.
Input
n
– Length of monomialsa
– Monomiala
b
– Monomialb
Output
ret
– Firsta[i]-b[i] != 0
wherei
starts from the end.
GTPSA.mad_mono_rev!
— Functionmad_mono_rev!(n::Cint, a, r)
Sets destination monomial r
equal to the reverse of source monomial a
.
Input
n
– Lengths of monomialsa
– Source monomiala
Output
r
– Destination monomial of reverse monomiala
GTPSA.mad_mono_str!
— Functionmad_mono_str!(n::Cint, a, s::Cstring)::Cint
Writes the monomial defined in the string s
, which stores the orders in a human-readable format (e.g. 10 is 10, not 0xa), into the byte array a
with the orders specified in hexadecimal.
Input
n
– Monomial and string lengths
– Monomial as string "[0-9]*"
Output
a
– Monomial as a byte array converted from the input stringi
– Adjusted sizen
of byte array if '