My Nbody code use a lot of Outer[Subtract, vect1,vect2]. A lot means ten thousands to one hundred thousands times !! This is at the heart of my Nbody code, and I need to speed up this instruction. Each vector contains for example N=10000 values, and it scales like N^2.
By compiling this instruction , I obtain a factor of 4 faster (4 CPU !!). Not bad , but I dont want to buy 128 CPU !.
I dont understand for example why Outer[Times,vect1,vect2] is about 100 times faster than Outer[Subtract, vect1,vect2].It would be great to get this speed-up !
Is there a way to speed up Outer[Subtract, vect1,vect2]? Using C++, or using GPU , or else ??
Thank you for getting some advices.