Today I am going to introduce the moustache target distribution (moustarget distribution for brievety). Load some packages first.
library(wesanderson) # on CRAN library(RShapeTarget) # available on https://github.com/pierrejacob/RShapeTarget/ library(PAWL) # on CRAN
Let’s invoke the moustarget distribution.
shape <- create_target_from_shape( file_name=system.file(package = "RShapeTarget", "extdata/moustache.svg"), lambda=5) rinit <- function(size) matrix(rnorm(2*size), ncol = 2) moustarget <- target(name = "moustache", dimension = 2, rinit = rinit, logdensity = shape$logd, parameters = shape$algo_parameters)
This defines a target distribution represented by a SVG file using RShapeTarget. The target probability density function is defined on and is proportional to on the segments described in the SVG files, and decreases exponentially fast to away from the segments. The density function of the moustarget is plotted below, a picture being worth a thousand words.
Since we explored some statitics of an abstract painting with Pierre (we even have an article in Variances last issue!), I became more sensitive to art linked to randomness. Here are some pointers to related websites I have digged out.
Creativity is the ability to introduce order into the randomness of nature.
You will find there contributed pages of users of the service about varied forms of arts, like pages which generate Samuel Beckett-like prose, or Jazz Scales. In visual arts, you can find for example the Bryce girl 1, a fractal landscape by Fuller Thompson of Bryce Canyon (with an extra sexy girl by the way); and nice pastel Richter-like pictures by Dave Nelson (to be compared with an excerpt of Richter’s 1024 colors):
Day to day data gather together artists who collect, list, database and absurdly analyse the data of everyday life. You can find there links to artists like Abigail Reynolds and her Mount Fear of crimes in London, and many others.
R users produced great outputs too. Interestingly, the two following graphs feel like 3D, although only made up of lines and curves. Paul Butler’s visualization of Facebook connections (with a bit of post processing):
At first sight, one could think this picture is a scale model of some narrow moutains, like Bryce Canyon… Actually it represents crimes in East London, an cardboard artwork by the Londoner artist Abigail Reynolds, called Mount Fear. Here is what can be read on the artist’s webpage:
The terrain of Mount Fear is generated by data sets relating to the frequency and position of urban crimes. Precise statistics are provided by the police. Each individual incident adds to the height of the model, forming a mountainous terrain.
All Mount Fear models are built on the same principals. The imaginative fantasy space seemingly proposed by the scupture is subverted by the hard facts and logic of the criteria that shape it. The object does not describe an ideal other-worldly space separated from lived reality, but conversely describes in relentless detail the actuality of life on the city streets.
No mention of the statistical method used (kernel, Dirichlet process density estimation?). Some crime data can be found on UNdata for example, or here for an interactive map. It reminds a great work by David Kahle about crime in Houston, combining ggplot2 and GoogleMap. He won a ggplot2 case study competition for this. His code is available here. I like in particular the contour plot, with cool rainbow colors, where both the crime level and the map background are clearly visible.
For a quick recap, Pierre and I supervised a team project at Ensae last year, on a statistical critique of the abstract painting 1024 Colours by painter Gerhard Richter. The four students, Clémence Bonniot, Anne Degrave, Guillaume Roussellet and Astrid Tricaud, did an outstanding job. Here is a selection of graphs and results they produced.
1. As a preliminary descriptive study, the following scatter plots come and complete the triangle plot.The R function , from the package of the same name, displays the pixels with their coordinates in the RGB cube. It shows that, as a joint law, the triplets are somehow concentrated along the black-white diagonal of the cube.
The same occurs when the points are projected on the sides of the cube. Here is a comparison with uniform simulations.
2. It is interesting to see what happens in other color representations. HSL and HSV are two cylindrical models, succintly described by this Wikimedia picture:
The points parameterized in these model were projected on the sides as well; here, the sides of the cylinder are to be seen as the circular top (or bottom), the lateral side, and the section of the cylinder by a half-plane from its axis. Its shows that some colors in the hue coordinate (rainbow-like colors) are missing, for instance green or purple.
For the HSL model,
and the HSV model.
The histograms complete this first analysis. For HSL,
3. The students built a few ad-hoc tests for uniformity, either following our perspective or on their own. They used a Kolmogorov-Smirnov test, a test, and some entropy based tests.
4. We eventually turned to testing for spatial autocorrelation. In other words, is the color of one cell related to the color of its neighbors (in which case you can predict the neighbors’ colors given one cell), or is it “non informative”? A graphical way to check this is to plot average level of a color coordinate of the neighbors of a pixel with respect to its own coordinate. Then to fit a simple regression on this cloud: if the slope of the regression line is non zero, then there is some correlation, of the sign of the slope. We tried various combinations of coordinates, and different radii for the neighborhood’s definition, with no significant correlation. A (so-called Moran) test quantified this assessment. Here is for example the plot of the average level of red of the eight closer neighbors of each pixel with respect to its level of red.
Second visit of a museum since the begining of the blog, second post on art for me. Random and bilingual neon light (Néon bilingue et aléatoire, 1971) by the french artist François Morellet, is exposed in Centre Pompidou in Paris. Random it says?…
Basically it is a kind of alarm clock, with digital figures. But not really the kind of alarm you would like at home, because its digits are random. The 21 tubes of neon light switch on and off randomly (according to the caption: with equiprobability). This makes a bit more than two million different combinations. Out of them, the caption in the museum says that 32 are words (english or french), like SEE, or APE, etc. The odds on this are more than 65 000 to one. When one of these words is picked out, it keeps flashing for five seconds, to make sure you won’t miss it.
I have been looking at the piece for only a few seconds. And, of course, I had the occasion to see such a word, the BUS word. It makes me think the draws might be biased, for example to make it more attractive. Or the draws are very quick, but it does not seems so. Any of you knows those neon lights?
Well I go back to the Sounds of Silence.
Thanks to Pierre, we now have a new playground for saptial stats, see this post. Before that, let’s see if we can see basic stuff without spatial information.
Data consist in three 32*32 tables, R, G and B, of numbers between 0 and 255. Certainly, the tables should be considered together as a 32*32 table of (r,g,b) vectors. Still, the first basic thing to do is to plot three separate histograms for R, G and B:
compared to uniformly simulated data
We see that the painter has a bias for darker colors, and rather misses light green and light blue ones.
Then, what can we do for representing (r,g,b) vectors? I guess that a good visualization is the color triangle
A few words to explain where it comes from. Say (r,g,b) data is normalized in the unit cube. Then the corners of the color triangle correspond to (plain) red, green and blue, from bottom left, right, to top. It is a section of the cube, with two opposite and equidistant points: black (0,0,0) and white (1,1,1). This triangle is said to be a simplex: any of its points’ coordinates sum to 1. Now the data in the triangle is obtained as (r,g,b) points, diveded by (r+g+b). It took me a while to compute the coordinates (x,y) of those points in a basis of the triangle (I did that stuff more easily back in highschool!). It should give something like that:
What do we see? The colors do not look like uniformly distributed, because 1) points are much more concentrated in the center, and 2) the painter favoured red colors in comparison with green ones (very few in the bottom right corner) and blue ones (in a minor extent). Arguing aginst point 1) could be that projecting (r,g,b) points on the simplex naturally implies a higher density in the center. That is right, but it would not be that dense, as we see with uniformly simulated data:
So colors are not uniform in the RGB model. There should be a cognitive interpretation out there, no? It is not obvious that human eyes comprehend colors on the same scale as the RGB model does. If not, there is no reason for human sight to comprehend uniformity in the same way as a computer. As Pierre pointed out, what we find in the RGB model might be different in the HSV model.We’ll see this model later.
Next step, spatial autocorrelation tests?
In this previous post, Julyan presented the paintings of Gerhard Richter, and asked whether the colours were really “randomly chosen”, as claimed by the painter. To answer the question from a statistical point of view (ie whether the colours are uniformly distributed in the (r,g,b) space or in the (x, y, r, g, b) space for instance, where x, y is the position and r, g, b the 3 colour components), we need to extract the data. Let’s take for example the following 1024 colours painting.
The data corresponding to this painting would be a 32*32 table, and in each cell of the table there would be a colour, represented for instance by 3 numbers, like in the RGB colour model. Tonight I’ve made a python script that extracts this data, with Julien‘s help. I took the marginal mean colour along both axis and converted it into grey scale. This gives two lines with white segments and grey segments. From that it is easy to find the middle of the segments, which gives the squares’ centres. Once the square centres were found, I simply took the mean colour of a smaller square around each centre.
As an output the script creates a BMP file with one pixel per colour (so it’s a tiny image, obviously 32*32 pixels), and a R file with 3 matrices called “R”, “G” and “B”, available here. This format is usually convenient since it’s plain text but if you want another one just ask me. If we zoom on the output BMP file we get:
The script is available here if you want to try it or modify it. I fear that there might be a slight mistake in the script because the colours don’t seem to be exactly the same in the output as in the input, but hopefully it’s close enough. The script needs an image to work on, for instance you can try on the pictures from this gallery. I tested it on two other pictures:
So now we have the data for three pictures (10, 192 and 1024 colours), and we can start to do some real stats. Are we going to find the same results in the RGB model as in the HSV model for instance ? If not, which colour model should we use?
To be continued!
This painting by Gerhard Richter is called 256 colors. The painter is fully committed to this kind of work, as you can see here. When visiting the San Francisco Museum of Modern Art (SFMOMA) (I’m getting literate…), the guide asked the following question:
Do you think the colors are positioned randomly or not?
Not a trivial question, is it? And you, would you say it is random? This work dates back to 1974, when computer screens mainly displayed green letters on a black background. So it seems the artist did not benefit of computer assistance.
There are many ways to interpret this plain English statement into statistic terms. For example, are the colors, with no ordering, uniformly distributed? (OK, this doesn’t mean at all (true) randomness, but this is a question…) It would be nice to have the 256 colors in RGB. In this color model, (0,0,0) is black, and (255,255,255) is white. I think that there are rather more dark colors than light ones, ie more data points near the (0,0,0) vertex than near the opposite one, in the RGB cube. So a test of uniformity would probably be rejected.
A more subtle way to interpret uniformity in the painting would be to take into account the position of the colors… Any idea how to check that? I have no clue.
Here is a larger one, 1024 colors…