Computational Complexity¶

This page analyzes the computational complexity of the Barnes-Hut algorithm and discusses the accuracy-performance tradeoffs.

Time Complexity¶

Tree Construction: O(N log N)¶

Building the spatial tree requires:

Sorting particles: O(N log N) using Morton ordering
Tree assembly: O(N) to build the tree structure
Computing node properties: O(N) to calculate centers of mass

The dominating factor is the sorting step, giving overall O(N log N) complexity.

Force Evaluation: O(M log N)¶

For M target particles querying forces from N source particles:

Tree traversal: Each particle visits O(log N) nodes on average
Force calculation: O(1) per node interaction
Total: O(M log N)

When M = N (self-force calculation), this becomes O(N log N).

Comparison with Direct Methods¶

Method	Time Complexity	Accuracy
Direct (brute force)	O(N²)	Exact
Barnes-Hut	O(N log N)	Approximate
Fast Multipole Method	O(N)	Approximate

Space Complexity¶

Memory Usage: O(N)¶

Particle storage: 3N floats for positions (2N in 2D)
Tree nodes: ~4N/3 nodes in worst case (complete tree)
Node properties: Center of mass, total mass per node
Total: O(N) memory usage

Cache Efficiency¶

The tree structure is optimized for cache performance:

Breadth-first node ordering improves locality
Contiguous arrays reduce memory fragmentation
Vectorized operations maximize throughput

Accuracy vs Performance¶

The θ Parameter¶

The accuracy-speed tradeoff is controlled by the θ (theta) parameter:

\[\text{Error} \propto \theta^2\]

\[\text{Speed} \propto \frac{1}{\log(1/\theta)}\]

θ Value	Relative Error	Relative Speed	Use Case
0.0	0% (exact)	1× (slowest)	Validation
0.1	~1%	2-3×	High accuracy
0.5	~25%	5-10×	Balanced (recommended)
1.0	~100%	10-20×	Fast approximation
2.0	~400%	20-50×	Very rough estimate

Error Analysis¶

The error in Barnes-Hut comes from two sources:

Approximation error: Treating distant groups as point masses
Truncation error: Using first-order multipole expansion

The total relative error is approximately:

\[\epsilon_{rel} \approx \theta^2 \left(1 + \frac{s}{2d}\right)\]

Where s/d is the size-to-distance ratio of the farthest node used in approximation.

Adaptive Accuracy¶

For applications requiring variable accuracy:

# High accuracy for nearby interactions
forces_precise = bhut.force(positions, masses, theta=0.1)

# Lower accuracy for distant background
forces_approx = bhut.force(positions, masses, theta=1.0)

Scaling Analysis¶

Strong Scaling (Fixed Problem Size)¶

Performance with increasing number of processors:

Ideal speedup: Limited by O(log N) tree traversal
Communication overhead: Increases with processor count
Memory bandwidth: Can become bottleneck

Weak Scaling (Fixed Work per Processor)¶

Performance with proportionally increasing problem size:

Tree depth: Grows as log N, maintaining efficiency
Load balancing: Space-filling curves help distribute work
Network communication: Scales well for distributed systems

Optimization Strategies¶

Algorithmic Optimizations¶

Morton ordering: Improves cache locality and tree balance
Vectorization: Process multiple particles simultaneously
Tree reuse: Rebuild only when necessary for dynamic systems

Implementation Optimizations¶

Memory pooling: Reduce allocation overhead
SIMD instructions: Vectorize force calculations
Prefetching: Hide memory latency

Parallel Optimizations¶

Shared-memory: OpenMP for multi-core parallelization
Distributed-memory: MPI for cluster computing
GPU acceleration: CUDA/OpenCL for massively parallel systems

Performance Benchmarks¶

Typical performance on modern hardware:

N Particles	Direct O(N²)	Barnes-Hut O(N log N)	Speedup
1,000	1ms	0.1ms	10×
10,000	100ms	2ms	50×
100,000	10s	30ms	333×
1,000,000	16min	400ms	2,400×

Benchmarks assume θ = 0.5 on a modern CPU