For Developers
Developers may fork the GTPSA.jl repo and then dev their forked repo in the REPL. For example, if my Github username is githubuser, then after forking GTPSA.jl I would run in the REPL:
import Pkg
Pkg.develop(url="https://github.com/githubuser/GTPSA.jl")The package consists of two layers: a low-level layer written in Julia that is 1-to-1 with the GTPSA C code, and a high-level, user-friendly layer that cleans up the notation for manipulating TPSs, manages temporaries generated during evaluation, and properly manages the memory in C when variables go out of scope in Julia. The low-level functions are listed at the bottom of this page.
When it comes to managing memory in C via Julia, there are certain intricacies that have to be considered. First, let's consider the Descriptor, which is the simplest: 
Descriptor
The Descriptor stores all information about the GTPSA, including the indexing table for indexing specific monomials. This is a static object, only created once for a GTPSA, which all TPSs refer to. In C, these structs must exist for the entirety of the program execution. In Julia, because they must not be destroyed when out-of-scope, we wrap these objects in immutable structs. In a single program, up to 250 Descriptors can exist simultanouesly, and they can be manually optionally destroyed using GTPSA.mad_desc_del!. At program termination all Descriptors are destroyed. Given the significantly large number allowed, as well as the danger of still-existing TPS and Descriptor structs after destruction, no front-facing interface to the user is given for destroying existing Descriptors.
The Descriptor struct simply wraps a C-pointer to a low-level struct called Desc: this struct is 1-to-1 equivalent to the C struct desc in GTPSA. See the documentation for GTPSA.Desc below. By having this struct in Julia, we can unsafe_load the struct and get values in the desc. For example, to access the first TPS{Float64} in the buffer of temporaries in the Descriptor, we can do
julia> using GTPSAjulia> import GTPSA: Descjulia> d = Descriptor(5,8)Descriptor(NV=5, MO=8)julia> desc = unsafe_load(d.desc) # To access the low-level C structGTPSA.Desc(2, 5, 5, 0, 0x08, 0x00, 0xff, Ptr{UInt8} @0x0000000003a82e28, 0, 4, 0x00000507, 0x00000000, 0x00000000, Ptr{Int32} @0x0000000002f9c9b8, Ptr{UInt8} @0x0000000002e0aa48, Ptr{UInt8} @0x000000000310f188, Ptr{UInt8} @0x0000000002f717c8, Ptr{Ptr{UInt8}} @0x0000000003701a88, Ptr{Ptr{UInt8}} @0x0000000003ac8e88, Ptr{Ptr{UInt8}} @0x0000000002dda0c8, Ptr{Int32} @0x0000000002876de8, Ptr{Int32} @0x0000000003a7f408, Ptr{Int32} @0x0000000003a8a128, Ptr{Int32} @0x00000000032ffc48, Ptr{Ptr{Int32}} @0x0000000003a5c058, Ptr{Ptr{Ptr{Int32}}} @0x000000000287f318, 0x0000000000022195, 0.0, 0.0, 0.0, Ptr{Ptr{Nothing}} @0x0000000002eaa738, Ptr{Ptr{Nothing}} @0x0000000002fcbcd8, Ptr{Int32} @0x00000000028d2348, Ptr{Int32} @0x0000000002f85aa8)julia> t_jl = unsafe_load(Base.unsafe_convert(Ptr{Ptr{TPS{Float64}}}, desc.t), 1) # 1 in Julia = 0 in CPtr{TPS64} @0x00000000034f1dd8
In Julia, if we were to then unsafe_load(t_jl), there would in fact be allocations, as its internally creating a copy of the TPS{Float64} in Julia. This is not well documented in the documentation of unsafe_load (see the discussion here).
TPS
The TPS{Float64} struct in Julia corresponds exactly to the C struct tpsa and TPS{ComplexF64} struct in Julia corresponds exactly to ctpsa in C. To understand fully the TPS struct, some history is needed:
In early development versions of GTPSA.jl, the TPS struct was very similar to the Descriptor struct: it used to just wrap a C Ptr to a low-level struct, and instead be mutable so out-of-scope TPSs and temporaries are cleaned up. This at the time seemed like the simplest solution, since in Julia there is no way to tag C pointers for Julia garbage collection. This created some problems though: firstly, there is one indirection because Julia would tag that particular mutable wrapper struct for GC, but then the member Ptr of that struct pointed to a different place in memory that had to be cleaned up. Secondly, when calling GTPSA C functions that want a Vector of TPS (e.g. Ptr to TPS), you would need to do a map(t->t.tpsa, x) where x is a Vector{TPS64} and tpsa is the field member of the wrapper TPS struct. Thirdly, and perhaps most significantly, Julia is not aware of how much memory the C is using. Therefore, it will not call the garbage collector very often, even if actually >100GB of memory is being used. Sometimes the OS will just kill the Julia process.
In order to get around these problems, we must allocate the entire mutable TPS struct in Julia instead of in the C code (e.g. using GTPSA.mad_tpsa_newd). We then can use pointer_from_objref in Julia to get a pointer to that mutable, Julia-owned struct to pass to the C functions. 
Sounds simple enough, right? If only! In the GTPSA C code, the coef member array is something called a flexible array member. This is great in C, because instead of the struct storing a pointer to an array (which would cause an indirection every time the coef array is accessed in a TPS), it actually stores the array right there in the struct, with variable size. This gives some performance gains. In Julia, there is no such analog. For those who know about StaticArrays.jl, you might think an SVector could work, but surprise, it doesn't because you cannot mutate the fields of an SVector and neither can the C code.
So the only solution it seems is to change the actual C struct in mad_tpsa_impl.h and mad_ctpsa_impl.h to use regular arrays for coef instead of flexible array members, and indeed this is what is done for GTPSA.jl. There is a tiny speed reduction due to the indirection of accessing coef, however the benefits of Julia's garbage collector knowing how much memory it's using, and keeping memory usage sane, is worth the very tiny cost.
On the Julia side, it turns out you cannot just use a regular Vector for the coef array in the TPS struct, because Julia's Vector structs are quite complex and play a lot of tricks. You might actually be able to use an MVector from StaticArrays, however using this struct might make GTPSA.jl significantly more complex because the size of the array has to be known at compile-time or else you suffer the drastic performance reductions caused by type-instabilities. The complexity of using this could be checked at some point in the future.
The decided solution was to, in the Julia, @ccall jl_malloc for the coef array, and in the finalizer for the mutable TPS struct call @ccall jl_free for the coef array. This gives us C-style arrays that Julia's garbage collector knows about, and so will make sure to keep the memory usage sane.
When @ccall is used, the arguments are Base.unsafe_converted to the corresponding specified argument types. Therefore, for TPS all we had to do then was define
Base.unsafe_convert(::Type{Ptr{TPS{T}}}, t::TPS{T}) where {T} = Base.unsafe_convert(Ptr{TPS{T}},pointer_from_objref(t))and now we can pass our TPS structs to C using @ccall.
TempTPS
Because unsafe_load of a Ptr{<:TPS} creates a copy and allocates, we cannot treat the constant buffer of pre-allocated temporaries in the Descriptor as bona-fide TPSs. Note that the memory addresses of the temporaries in the buffer are constant and do not need to be cleaned up; they are immutable!. The temporaries, which we give the type GTPSA.TempTPS, do not have any of the problems of just wrapping a pointer as do the TPSs, and so that's what they are. Also in Julia, in order to access the fields of a TempTPS (e.g. mo) via unsafe_load without allocating, we need an immutable struct having the same structure as TPS. This is the purpose of GTPSA.LowTempTPS. We need to use the mo field for @FastGTPSA to finally allocate the result TPS with the mo of the result TempTPS.
As with TPS, we also had to define Base.unsafe_convert for TempTPS so we can @ccall. In this case, the unsafe_convert returns the member Ptr of the TempTPS.
GTPSA.TempTPS — Typestruct TempTPS{T<:Union{Float64,ComplexF64},D}This is for internal use only. TempTPS is a temporary TPS, which has  been pre-allocated in a buffer for each thread in the Descriptor C struct.  When using the @FastGTPSA macro, all temporaries generated will be used  from this buffer. "Constructors" of this type simply take a temporary from  that particular thread's buffer in a stack-like manner and "Destructors"  (which must be manually called because this is immutable) release it from  the stack.
Fields
- t::Ptr{TPS{T,D}}– Pointer to the- TPSin the buffer in the- Descriptor
Library Structure
All operators have an in-place, mutating version specified with a bang (!). These are the lowest-level pure Julia code, following the convention that the first argument is the one to contain the result. In the GTPSA C library, the last argument contains the result, so this is accounted for in the file inplace_operators.jl. All in-place functions can receive either a regular TPS , which the user will be using, as well as a GTPSA.TempTPS, which the user should not concern themselves with. The constants RealTPS and ComplexTPS are defined respectively in low_level/rtpsa.jl and low_level/ctpsa.jl to simplify the notation. These are just:
# Internal constants to aid multiple dispatch including temporaries 
const RealTPS = Union{TempTPS{Float64}, TPS{Float64}}
const ComplexTPS = Union{TempTPS{ComplexF64}, TPS{ComplexF64}}All this does is enforce correct types for the in-place functions, while keeping the notation/code simple.
The in-place, mutating functions, defined in inplace_operators.jl must all use the RealTPS and ComplexTPS "types". Then, the higher level out-of-place functions for both TPS and TempTPS, which do different things with the result, will use these in-place functions.
The out-of-place functions for TPS are defined in operators.jl, and the out-of-place functions for TempTPS are defined in fastgtpsa/operators.jl.
Fast GTPSA Macros
The @FastGTPSA/@FastGTPSA! macros work by changes all arithmetic operators in different Julia arithmetic operators with the same operator precedence and unary operator capabilities. These special operators then dispatch on functions that use the temporaries when a TPS or TempTPS is passed, else default to their original operators, thereby making it completely transparent to non-TPS types. Both + and - must work as unary operators, and there is a very short list of allowed ones shown here. The other arithmetic operators were chosen somewhat randomly from the top of the same file, next to prec-plus, prec-times, and prec-power which defines the operator precedences. By taking this approach, we relieve ourselves of having to rewrite PEMDAS and instead let the Julia do it for us.
All arithmetic operators are changed to GTPSA.:<special symbols>, e.g. + → GTPSA.:±. All non-arithmetic operators that are supported by GTPSA are then changed to GTPSA.__t_<operator>, e.g. sin → GTPSA.__t_sin, where the prefix __t_ is also chosen somewhat arbitrarily. These operators are all defined in fastgtpsa/operators.jl, and when they encounter a TPS type, they use the temporaries, and when other number types are detected, they fallback to the regular, non-__t_ operator. This approach works extraordinarily well, and introduces no problems externally because none of these functions/symbols are exported.
Calling the C-library with pointers and pointer-to-pointers
All of the GTPSA map functions require a vector of TPS as input, in C **tpsa. In Julia, this works automatically for AbstractArray{<:Union{TPS64,ComplexTPS64}} by specifying the C argument type in the C call as ::Ptr{TPS64} or ::Ptr{ComplexTPS64}.  However, this can be misleading to those C inputs which require *tpsa, e.g. also ::Ptr{TPS64}. For consistency in the low level library interface, all pointers are specified as ::Ref{TPS64} and ::Ref{ComplexTPS64}, and pointers-to-pointers are specified as Ptrs. For single TPSs, the cconvert to Ref in the ccall is handled automatically.
However, in some cases, one might have only a single TPS64 and would like the call the corresponding map functions without having to allocate an array. After some experimenting, I've found the following overrides to gives zero allocations:
Base.unsafe_convert(::Type{Ptr{TPS{T}}}, r::Base.RefValue{Ptr{Nothing}}) where {T} = Base.unsafe_convert(Ptr{TPS{T}}, Base.unsafe_convert(Ptr{Cvoid}, r))
Base.cconvert(::Type{Ptr{TPS{T}}}, t::TPS{T}) where {T} = Ref(pointer_from_objref(t))NOTE: We need to have a GC.@preserve before the cconvert to keep t valid! As such, you will see in all map functions a call to GC.@preserve. 
This override is only necessary for the mutable TPS. The other array inputs of isbits types are ok and basically have the above defined for them. See this discussion for more details.
Low-Level
Below is documentation for every single 1-to-1 C function in the GTPSA library. If there is any function missing, please submit an issue to GTPSA.jl.
Monomial
GTPSA.mad_mono_add! — Functionmad_mono_add!(n::Cint, a, b, r)Sets monomial r = a + b.
Input
- n– Length of monomials
- a– Source monomial- a
- b– Source monomial- b
Output
- r– Destination monomial,- r = a + b
GTPSA.mad_mono_cat! — Functionmad_mono_cat!(n::Cint, a, m::Cint, b, r)Sets monomial r equal to the concatenation of the monomials a and b
Input
- n– Length of monomonial- a
- a– Source monomial- a
- m– Length of monomial- b
- b– Source monomial- b
Output
- r– Destination monomial of concatenation of- aand- b(length- n+m)
GTPSA.mad_mono_cmp — Functionmad_mono_cmp(n::Cint, a, b)::CintCompares monomial a to monomial b, and returns the first difference in the lowest order variables.
Input
- n– Length of monomials
- a– Monomial- a
- b– Monomial- b
Output
- ret– First- a[i]-b[i] != 0
GTPSA.mad_mono_copy! — Functionmad_mono_copy!(n::Cint, a, r)Copies monomial a to monomial r.  
Input
- n– Length of monomials
- a– Source monomial
- r– Destination monomial
GTPSA.mad_mono_eq — Functionmad_mono_eq(n::Cint, a, b)::BoolChecks if the monomial a is equal to the monomial b.
Input
- n– Length of monomials
- a– Monomial- a
- b– Monomial- b
Output
- ret– True if the monomials are equal, false if otherwise
GTPSA.mad_mono_eqn — Functionmad_mono_eqn(n::Cint, a, b::Cuchar)::Bool???
GTPSA.mad_mono_fill! — Functionmad_mono_fill!(n::Cint, a, v::Cuchar)Fills the monomial a with the value v.
Input
- n– Monomial length
- a– Monomial
- v– Value
GTPSA.mad_mono_le — Functionmad_mono_le(n::Cint, a, b)::BoolChecks if monomial a is less than or equal to monomial b.
Input
- n– Length of monomials
- a– Monomial- a
- b– Monomial- b
Output
- ret– True if- a <= mono_b, false otherwise
GTPSA.mad_mono_lt — Functionmad_mono_lt(n::Cint, a, b)::BoolChecks if monomial a is less than monomial b.
Input
- n– Length of monomials
- a– Monomial- a
- b– Monomial- b
Output
- ret– True if- a < b, false otherwise
GTPSA.mad_mono_max — Functionmad_mono_max(n::Cint, a)::CucharReturns the maximum order of the monomial.
Input
- n– Length of monomial
- a– Monomial
Output
- mo– Maximum order of monomial- a
GTPSA.mad_mono_min — Functionmad_mono_min(n::Cint, a)::CucharReturns the minimum order of the monomial.
Input
- n– Length of monomial
- a– Monomial
Output
- mo– Mininum order of monomial- a
GTPSA.mad_mono_ord — Functionmad_mono_ord(n::Cint, a)::CintReturns the sum of the orders of the monomial a.
Input
- n– Monomial length
- a– Monomial
Output
- s– Sum of orders of monomial
GTPSA.mad_mono_ordp — Functionmad_mono_ordp(n::Cint, a, stp::Cint)::CdoubleReturns the product of each stp-th order in monomial a. For example, stp = 2 collects every order in the monomial with a step of 2 between each. As a is a pointer, the product can be started at any  element in the monomial.
Input
- n– Monomial length
- a– Monomial as byte array
- stp– Step over which orders to include in the product
Output
- p– Product of orders of monomial separated by- stp.
GTPSA.mad_mono_ordpf — Functionmad_mono_ordpf(n::Cint, a, stp::Cint)::CdoubleReturns the product of factorials each stp-th order in monomial a. For example, stp = 2 collects  every order in the monomial with a step of 2 between each. As a is a pointer, the product can be started  at any element in the monomial.
Input
- n– Monomial length
- a– Monomial as byte array
- stp– Step over which orders to include in the product of factorials
Output
- p– Product of factorials of orders of monomial separated by- stp
GTPSA.mad_mono_print — Functionmad_mono_print(n::Cint, a, sep_::Cstring, fp_::Ptr{Cvoid})Prints the monomial to stdout.
Input
- n– Length of monomial
- a– Source monomial to print to- stdout
- sep_– Separator string
- fp_– C- FILEpointer, if null will print to- stdout
GTPSA.mad_mono_prt! — Functionmad_mono_prt(n::Cint, a, s::Ptr{Cuchar})::CstringWrites the monomial defined by the byte array a (with orders stored as hexadecimal) into  a null terminated string s.
Input
- n– Monomial and string length
- a– Monomial as byte array
Output
- ret– Monomial as string
GTPSA.mad_mono_rcmp — Functionmad_mono_rcmp(n::Cint, a, b)::CintCompares monomial a to monomial b starting from the right (when the monomials are ordered by variable,  which is almost never the case) and returns the first difference in the lowest order variables. 
Input
- n– Length of monomials
- a– Monomial- a
- b– Monomial- b
Output
- ret– First- a[i]-b[i] != 0where- istarts from the end.
GTPSA.mad_mono_rev! — Functionmad_mono_rev!(n::Cint, a, r)Sets destination monomial r equal to the reverse of source monomial a.
Input
- n– Lengths of monomials
- a– Source monomial- a
Output
- r– Destination monomial of reverse monomial- a
GTPSA.mad_mono_str! — Functionmad_mono_str!(n::Cint, a, s::Cstring)::CintWrites the monomial defined in the string s, which stores the orders in a human-readable format  (e.g. 10 is 10, not 0xa), into the byte array a with the orders specified in hexadecimal.
Input
- n– Monomial and string length
- s– Monomial as string "[0-9]*"
Output
- a– Monomial as a byte array converted from the input string
- i– Adjusted size- nof byte array if '