Tuesday, 13 May 2014

Is ergodicity a reasonable hypothesis? Understanding Boltzmann's ergodic hypothesis

Ergodic vs. non-ergodic
trajectories (Wikipedia)
Many undergraduate Physics students barely study Ergodic Hypothesis in detail. It is usually manifested as ensemble averages being equal to time averages. While the concept of the statistical ensemble maybe accessible to students, when it comes to ergodic theory and theorems,  where higher level mathematical jargon kicks in, it maybe confusing for the novice reader or even practicing Physicists and educator what does ergodicity really mean. For example recent pre-print titled "Is ergodicity a reasonable hypothesis?" defines the ergodicity as follows:
...In the physics literature "ergodicity" is taken to mean that a system, including a macroscopic one, visits all microscopic states in a relatively short time...[link]
Visiting all microscopic states is not a pre-condition for ergodicity from statistical physics stand point. This form of the theory is the manifestation of strong ergodic hypothesis because of the Birkhoff theorem and may not reflect the physical meaning of ergodicity.  However,  the originator of ergodic hypothesis,  Boltzmann, had a different thing in mind in explaining how a system approaches to thermodynamic equilibrium. One of the best explanations are given in the book of J. R. Dorfman, titled An introduction to Chaos and Nonequilibrium Statistical Mechanics [link], in section 1.3, Dorfman explains what Boltzmann had in mind:
...Boltzmann then made the hypothesis that a mechanical system's trajectory in phase-space will spend equal times in regions of equal phase-space measure. If this is true, then any dynamical system will spend most of its time in phase-space region where the values of the interesting macroscopic properties are extremely close to the equilibrium values...[link]
Saying this, Boltzmann did not suggest that a system should visit ALL microscopic states.  His argument only suggests that only states which are close the equilibrium has more likelihood to be visited.

Friday, 17 January 2014

Particle approximation to probability density functions: Dirac delta function representation

In the previous post, I have briefly shown the idea of using dirac delta function for discrete data representation. In the second example there, a histogram locations for a given set of points are presented as spike trains, where as heights are somehow given in a second sum. This is hard to follow and visualise, of course if you are not that good in reading formulation with different indexes. Due to pedagocial reasons, an easier representation of arbitrary probability density function (PDF), $p(x)$, one would simply need to couple each discrete points with a corresponding weight.

Hence, a set  $\{x_{i}, \omega^{i}\}_{i=1}^{N}$ would be an estimation of PDF, $\hat{p}(x)$ . At this point we can invoke dirac delta function,

$ \hat{p}(x) = \sum_{i=1}^{N} \omega^{i} \delta(x-x_{i})$

Let's revisit the R code given there, this time let's draw uniform numbers between $[-2, 2]$ to get 100  $x_{i}$ values. Simply these numbers will indicate the locations on the x-axis, a spike train.  For simplicity, let's use Gausian distribution for target PDF, $ \mathcal{N}(0, 1)$.  Than, for weights we need to draw numbers using the spike locations. This approach is easier to understand compare to my previous double index notation.

R Example code

Above explained procedure is trivial to implement in R.

# Generate locations 100 x locations
# out out 1000 points in [-2.0, 2.0]
# Domain where Dirac comb operates
Xj = seq(-2,2,0.002) 
Xi = sample(Xj, 100)
# Now generate weights from N(0,1) at those given locations
Wi = dnorm(Xi)
# Now visualise
plot(Xi, Wi, type="h",xlim=c(-2.0,2.0),ylim=c(0,0.6),lwd=2,col="blue",ylab="p")


Above notation introduces second  abuse of notation while actually there must be a secondary regular grid that pics $x_{i}$ values using dirac delta in practice. Because, the argument of $\hat{p}(x)$, is in the discrete domain. So a little better notation that reflects the above code would be

$ \hat{p}(x_j) = \sum_{i=1}^{N} \omega^{i} \delta(x_{j}-x_{i})$

The set $x_j$ is simply defined in a certain domain, for example regularly. Hence I only recommend not to introduce dirac delta for explaining a particle approximation to PDFs for novice students in the class. It will only confuse them even more.
Figure: Spike trains with weights $\hat{p}(x) = \sum_{i=1}^{N} \omega^{i} \delta(x-x_{i})$

(c) Copyright 2008-2015 Mehmet Suzen (suzen at acm dot org)

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License.