## From SVG to probability distributions [with R package]

Hey,

To illustrate generally complex probability density functions on continuous spaces, researchers always use the same examples, for instance mixtures of Gaussian distributions or a banana shaped distribution defined on with density function:

If we draw a sample from this distribution using MCMC we obtain a [scatter]plot like this one:

Clearly it doesn’t really look like a banana, even if you use yellow to colour the dots like here. Actually it looks more like a boomerang, if anything. I was worried about this for a while, until I came up with a more realistic banana shaped distribution:

See how the shape is well defined compared to the first figure? And there’s even the little tail, that proves so convenient when we want to peel off the fruit. More generally we might want to create target density functions based on general shapes. For this you can now try RShapeTarget, which you can install directly from R using devtools:

library(devtools) install_github(repo="RShapeTarget", username="pierrejacob")

The package parses SVG files representing shapes, and creates target densities from them. More precisely, a SVG files contains “paths”, which are sequence of points (for instance the above banana is a single closed path). The associated log density at any point is defined by where is the closest path of the shape from and is the distance between the point and the path. The parameter specifies the rate at which the density decays when the point goes away from the shape. With this you can define the maple leaf distribution, as a tribute to JSM 2013:

In the package you can get a distribution from a SVG file using the following code:

library(RShapeTarget) # create target from file my_shape_target <- create_target_from_shape(my_svg_file_name, lambda =1) # test the log density function on 25 randomly generated points my_shape_target$logd(matrix(rnorm(50), ncol = 2), my_shape_target$algo_parameters)

Since characters are just a bunch of paths, you can also define distributions based on words, for instance:

which is done as follows (warning you’re only allowed a-z and A-Z, no numbers no space no punctuation for now):

library(RShapeTarget) word_target <- create_target_from_word("Hodor")

For the words, I defined the target density function as before, except that it’s constant on the letters: so if a point is outside a letter its density is computed based on the distance to the nearest path; if it’s inside a letter it’s just constant, so that the letters are “filled” with some constant density. I thought it’d look better.

Now I’m not worried about the banana shaped distribution any more, but by the fact that the only word I could think of was “Hodor” (with whom you can chat over there).

RINTUsaid, on 26 August 2013 at 06:20nice!

xi'ansaid, on 27 September 2013 at 22:37Hodor! Hodor…

Moustache target distribution and Wes Anderson | Statisfactionsaid, on 31 March 2014 at 16:51[…] defines a target distribution represented by a SVG file using RShapeTarget. The target probability density function is defined on and is proportional to on the segments […]

Moustache target distribution and Wes Anderson ← Patient 2 Earnsaid, on 31 March 2014 at 23:12[…] defines a target distribution represented by a SVG file using RShapeTarget. The target probability density function is defined on and is proportional to on the segments […]

Mohasaid, on 28 May 2015 at 09:02Hi,

when I use this command “word_target <- create_target_from_word("Hodor")"

I have this error: "Error: XML content does not seem to be XML: 'C:\Users\…\Documents\R\win-library\3.2\RShapeTarget\extdataH_uppercase_path.svg'

would you please help me figure it out?

Thank you

Pierre Jacobsaid, on 28 May 2015 at 09:46Hello,

I suppose it does not work because you’re using Windows and I haven’t tried the package on it. I must have used non-Windows filename notation somewhere. For example, there should be a backslash between exdata and H_uppercase_path.svg. I suspect the problem mies in “extract_paths_from_letter”.

Pierre

Kamiltonian Monte Carlo | herr strathmannsaid, on 21 July 2015 at 13:31[…] RWM to HMC. Using the well known and (ab)used Banana density as a target, we feed a non-adaptive version of KMC and friends with an increasing number of […]