abstracts of papers

conference abstracts

lecture notes

equipment specs

press coverage



Abstracts of papers

To download a pdf or supplementary file, right-click the link and select "Save Target As ..." or "Save Link As ..."

See copyright notice

May, K.A. & Zhaoping, L. (2016). Efficient coding theory predicts a tilt aftereffect from viewing untilted patterns. Current Biology, 26, 1571–1576.

The brain is bombarded with a continuous stream of sensory information, but biological limitations on the data-transmission rate require this information to be encoded very efficiently [1]. Li and Atick [2] proposed that the two eyes’ signals are coded efficiently in the brain using mutually decorrelated binocular summation and differencing channels; when a channel is strongly stimulated by the visual input, such that sensory noise is negligible, the channel should undergo temporary desensitization (known as adaptation). To date, the evidence for this theory has been limited [3, 4], and the binocular differencing channel is missing from many models of binocular integration [510]. Li and Atick’s theory makes the remarkable prediction that perceived direction of tilt (clockwise or counterclockwise) of a test pattern can be controlled by pre-exposing observers to visual adaptation patterns that are untilted or even have no orientation signal. Here, we confirm this prediction. Each test pattern consisted of different images presented to the two eyes such that the binocular summation and difference signals were tilted in opposite directions, to give ambiguous information about tilt; by selectively desensitizing one or other of the binocular channels using untilted or non-oriented binocular adaptation patterns, we controlled the perceived tilt of the test pattern. Our results provide compelling evidence that the brain contains binocular summation and differencing channels that adapt to the prevailing binocular statistics.

Solomon, J.A., May, K.A. & Tyler, C.W. (2016). Inefficiency of orientation averaging: Evidence for hybrid serial/parallel temporal integration. Journal of Vision, 16(1):13, 1–7.

Intuition suggests that increased viewing time should allow for the accumulation of more visual information, but scant support for this idea has been found in studies of voluntary averaging, where observers are asked to make decisions based on perceived average size. In this paper we examine the dynamics of information accrual in an orientation-averaging task. With orientation (unlike intensive dimensions such as size), it is relatively safe to use an item's physical value as an approximation for its average perceived value. We displayed arrays containing eight iso-eccentric Gabor patterns, and asked six trained psychophysical observers to compare their average orientation with that of probe stimuli that were visible before, during, or only after the presentation of the Gabor array. From the relationship between orientation variance and human performance, we obtained estimates of effective set size, i.e., the number of items that an ideal observer would need to assess in order to estimate average orientation as well as our human observers did. We found that display duration had only a modest influence on effective set size. It rose from an average of ~2 for 0.1-s displays to an average of ~3 for 3.3-s displays. These results suggest that the visual computation is neither purely serial nor purely parallel. Computations of this nature can be made with a hybrid process that takes a series of subsamples of a few elements at a time.

May, K.A. & Solomon, J.A. (2015). Connecting psychophysical performance to neuronal response properties II: Contrast decoding and detection. Journal of Vision, 15(6):9, 1–21.

The purpose of this article is to provide mathematical insights into the results of some Monte Carlo simulations published by Tolhurst and colleagues (Clatworthy, Chirimuuta, Lauritzen, & Tolhurst, 2003; Chirimuuta & Tolhurst, 2005a). In these simulations, the contrast of a visual stimulus was encoded by a model spiking neuron or a set of such neurons. The mean spike count of each neuron was given by a sigmoidal function of contrast, the Naka-Rushton function. The actual number of spikes generated on each trial was determined by a doubly stochastic Poisson process. The spike counts were decoded using a Bayesian decoder to give an estimate of the stimulus contrast. Tolhurst and colleagues used the estimated contrast values to assess the model's performance in a number of ways, and they uncovered several relationships between properties of the neurons and characteristics of performance. Although this work made a substantial contribution to our understanding of the links between physiology and perceptual performance, the Monte Carlo simulations provided little insight into why the obtained patterns of results arose or how general they are. We overcame these problems by deriving equations that predict the model's performance. We derived an approximation of the model's decoding precision using Fisher information. We also analyzed the model's contrast detection performance and discovered a previously unknown theoretical connection between the Naka-Rushton contrast-response function and the Weibull psychometric function. Our equations give many insights into the theoretical relationships between physiology and perceptual performance reported by Tolhurst and colleagues, explaining how they arise and how they generalize across the neuronal parameter space.

May, K.A. & Solomon, J.A. (2015). Connecting psychophysical performance to neuronal response properties I: Discrimination of suprathreshold stimuli. Journal of Vision, 15(6):8, 1–26.

One of the major goals of sensory neuroscience is to understand how an organism's perceptual abilities relate to the underlying physiology. To this end, we derived equations to estimate the best possible psychophysical discrimination performance, given the properties of the neurons carrying the sensory code. We set up a generic sensory coding model with neurons characterized by their tuning function to the stimulus and the random process that generates spikes. The tuning function was a Gaussian function or a sigmoid (Naka-Rushton) function. Spikes were generated using Poisson spiking processes whose rates were modulated by a multiplicative, gamma-distributed gain signal that was shared between neurons. This doubly stochastic process generates realistic levels of neuronal variability and a realistic correlation structure within the population. Using Fisher information as a close approximation of the model's decoding precision, we derived equations to predict the model's discrimination performance from the neuronal parameters. We then verified the accuracy of our equations using Monte Carlo simulations. Our work has two major benefits. Firstly, we can quickly calculate the performance of physiologically plausible population-coding models by evaluating simple equations, which makes it easy to fit the model to psychophysical data. Secondly, the equations revealed some remarkably straightforward relationships between psychophysical discrimination performance and the parameters of the neuronal population, giving deep insights into the relationships between an organism's perceptual abilities and the properties of the neurons on which those abilities depend.

Hansen,B.C., May, K.A. & Hess, R.F. (2014). One "shape" fits all: The orientation bandwidth of contour integration. Journal of Vision, 14(13):17, 1–21.

The ability of human participants to integrate fragmented stimulus elements into perceived coherent contours (amidst a field of distracter elements) has been intensively studied across a large number of contour element parameters, ranging from luminance contrast and chromaticity to motion and stereo. The evidence suggests that contour integration performance depends on the low-level Fourier properties of the stimuli. Thus, to understand contour integration, it would be advantageous to understand the properties of the low-level filters that the visual system uses to process contour stimuli. We addressed this issue by examining the role of stimulus element orientation bandwidth in contour integration, a previously unexplored area. We carried out three psychophysical experiments, and then simulated all of the experiments using a recently developed two-stage filter-overlap model whereby the contour grouping occurs by virtue of the overlap between the filter responses to different elements. The first stage of the model responds to the elements, while the second stage integrates the responses along the contour. We found that the first stage had to be fairly broadly tuned for orientation to account for our results. The model showed a very good fit to a large data set with relatively few free parameters, suggesting that this class of model may have an important role to play in helping us to better understand the mechanisms of contour integration.

Dumoulin, S.O., Hess, R.F., May, K.A., Harvey, B.M, Roker, B. & Barendregt, M. (2014). Contour extracting networks in early extrastriate cortex. Journal of Vision, 14(5):18, 1–14.

Neurons in the visual cortex process a local region of visual space, but in order to adequately analyze natural images, neurons need to interact. The notion of an “association field” proposes that neurons interact to extract extended contours. Here, we identify the site and properties of contour integration mechanisms. We used functional magnetic resonance imaging (fMRI) and population receptive field (pRF) analyses. We devised pRF mapping stimuli consisting of contours. We isolated the contribution of contour integration mechanisms to the pRF by manipulating the contour content. This stimulus manipulation led to systematic changes in pRF size. Whereas a bank of Gabor filters quantitatively explains pRF size changes in V1, only V2/V3 pRF sizes match the predictions of the association field. pRF size changes in later visual field maps, hV4, LO-1, and LO-2 do not follow either prediction and are probably driven by distinct classical receptive field properties or other extraclassical integration mechanisms. These pRF changes do not follow conventional fMRI signal strength measures. Therefore, analyses of pRF changes provide a novel computational neuroimaging approach to investigating neural interactions. We interpreted these results as evidence for neural interactions along co-oriented, cocircular receptive fields in the early extrastriate visual cortex (V2/V3), consistent with the notion of a contour association field.

May, K.A. & Solomon, J.A. (2013). Four theorems on the psychometric function. PLoS ONE, 8(10):e74815.

In a 2-alternative forced-choice (2AFC) discrimination task, observers choose which of two stimuli has the higher value. The psychometric function for this task gives the probability of a correct response for a given stimulus difference, Δx. This paper proves four theorems about the psychometric function. Assuming the observer applies a transducer and adds noise, Theorem 1 derives a convenient general expression for the psychometric function. Discrimination data are often fitted with a Weibull function. Theorem 2 proves that the Weibull “slope” parameter, β, can be approximated by βNoise × βTransducer, where βNoise is the β of the Weibull function that fits best to the cumulative noise distribution, and βTransducer depends on the transducer. We derive general expressions for βNoise and βTransducer, from which we derive expressions for specific cases. One case that follows naturally from our general analysis is Pelli’s finding that, when d′ ∝ (Δx)b, β ≈ βNoise × b. We also consider two limiting cases. Theorem 3 proves that, as sensitivity improves, 2AFC performance will usually approach that for a linear transducer, whatever the actual transducer; we show that this does not apply at signal levels where the transducer gradient is zero, which explains why it does not apply to contrast detection. Theorem 4 proves that, when the exponent of a power-function transducer approaches zero, 2AFC performance approaches that of a logarithmic transducer. We show that the power-function exponents of 0.4–0.5 fitted to suprathreshold contrast discrimination data are close enough to zero for the fitted psychometric function to be practically indistinguishable from that of a log transducer. Finally, Weibull β reflects the shape of the noise distribution, and we used our results to assess the recent claim that internal noise has higher kurtosis than a Gaussian. Our analysis of β for contrast discrimination suggests that, if internal noise is stimulus-independent, it has lower kurtosis than a Gaussian.

McIlhagga, W.H. & May, K.A. (2012). Optimal edge filters explain human blur detection. Journal of Vision, 12(10):9, 1–13.

Edges are important visual features, providing many cues to the three-dimensional structure of the world. One of these cues is edge blur. Sharp edges tend to be caused by object boundaries, while blurred edges indicate shadows, surface curvature, or defocus due to relative depth. Edge blur also drives accommodation and may be implicated in the correct development of the eye's optical power. Here we use classification image techniques to reveal the mechanisms underlying blur detection in human vision. Observers were shown a sharp and a blurred edge in white noise and had to identify the blurred edge. The resultant smoothed classification image derived from these experiments was similar to a derivative of a Gaussian filter. We also fitted a number of edge detection models (MIRAGE, N1, and N3+) and the ideal observer to observer responses, but none performed as well as the classification image. However, observer responses were well fitted by a recently developed optimal edge detector model, coupled with a Bayesian prior on the expected blurs in the stimulus. This model outperformed the classification image when performance was measured by the Akaike Information Criterion. This result strongly suggests that humans use optimal edge detection filters to detect edges and encode their blur.

View equipment specs

Huang, P.-C., Maehara,G., May, K.A. & Hess, R.F. (2012). Pattern masking: The importance of remote spatial frequencies and their phase alignment. Journal of Vision, 12(2):14, 1–13.

To assess the effects of spatial frequency and phase alignment of mask components in pattern masking, target threshold vs. mask contrast (TvC) functions for a sine-wave grating (S) target were measured for five types of mask: a sine-wave grating (S), a square-wave grating (Q), a missing fundamental square-wave grating (M), harmonic complexes consisting of phase-scrambled harmonics of a square wave (Qp), and harmonic complexes consisting of phase-scrambled harmonics of a missing fundamental square wave (Mp). Target and masks had the same fundamental frequency (0.46 cpd) and the target was added in phase with the fundamental frequency component of the mask. Under monocular viewing conditions, the strength of masking depends on phase relationships among mask spatial frequencies far removed from that of the target, at least 3 times the target frequency, only when there are common target and mask spatial frequencies. Under dichoptic viewing conditions, S and Q masks produced similar masking to each other and the phase-scrambled masks (Qp and Mp) produced less masking. The results suggest that pattern masking is spatial frequency broadband in nature and sensitive to the phase alignments of spatial components.

View equipment specs

May, K.A., Zhaoping, L. & Hibbard, P.B. (2012). Perceived direction of motion determined by adaptation to static binocular images. Current Biology, 22, 28–32.  Download PDF (588 KB)  Supplemental Experimental Procedures (56 KB)

In Li and Atick's [1, 2] theory of efficient stereo coding, the two eyes' signals are transformed into uncorrelated binocular summation and difference signals, and gain control is applied to the summation and differencing channels to optimize their sensitivities. In natural vision, the optimal channel sensitivities vary from moment to moment, depending on the strengths of the summation and difference signals; these channels should therefore be separately adaptable, whereby a channel's sensitivity is reduced following overexposure to adaptation stimuli that selectively stimulate that channel. This predicts a remarkable effect of binocular adaptation on perceived direction of a dichoptic motion stimulus [3]. For this stimulus, the summation and difference signals move in opposite directions, so perceived motion direction (upward or downward) should depend on which of the two binocular channels is most strongly adapted, even if the adaptation stimuli are completely static. We confirmed this prediction: a single static dichoptic adaptation stimulus presented for less than 1 s can control perceived direction of a subsequently presented dichoptic motion stimulus. This is not predicted by any current model of motion perception and suggests that the visual cortex quickly adapts to the prevailing binocular image statistics to maximize information-coding efficiency.

View equipment specs

May, K.A. & Zhaoping, L. (2011). Exploring the roles of saturating and supersaturating contrast-response functions in conjunction detection and contrast coding. Journal of Vision, 11(9):11, 1–15.

J. W. Peirce (2007, p. 1) has proposed that saturating contrast-response functions in V1 and V2 may form “a critical part of the selective detection of compound stimuli over their components” and that supersaturating (non-monotonic) functions allow even greater conjunction selectivity. Here, we argue that saturating and supersaturating contrast-response functions cannot be exploited by conjunction detectors in the way that Peirce proposes. First, the advantage of these functions only applies to conjunctions with components of lower contrast than the equivalent non-conjunction stimulus, e.g., plaids (conjunctions) vs. gratings (non-conjunctions); most types of conjunction do not have this property. Second, in many experiments, conjunction and non-conjunction components have identical contrast, sampling the contrast-response function at a single point, so the function's shape is irrelevant. Third, Peirce considered only maximum-contrast stimuli, whereas contrasts in natural scenes are low, corresponding to a contrast-response function's expansive region; we show that, for naturally occurring contrasts, Peirce's plaid detector would generally respond more weakly to plaids than to gratings. We also reassess Peirce's claim that supersaturating contrast-response functions are suboptimal for contrast coding; we argue that supersaturation improves contrast coding, and that the multiplicity of supersaturation levels reflects varying trade-offs between contrast coding and coding of other features.

Zhaoping, L., Geisler, W.S. & May, K.A. (2011). Human wavelength discrimination of monochromatic light explained by optimal wavelength decoding of light of unknown intensity. PLoS ONE, 6(5): e19248.

We show that human ability to discriminate the wavelength of monochromatic light can be understood as maximum likelihood decoding of the cone absorptions, with a signal processing efficiency that is independent of the wavelength. This work is built on the framework of ideal observer analysis of visual discrimination used in many previous works. A distinctive aspect of our work is that we highlight a perceptual confound that observers should confuse a change in input light wavelength with a change in input intensity. Hence a simple ideal observer model which assumes that an observer has a full knowledge of input intensity should over-estimate human ability in discriminating wavelengths of two inputs of unequal intensity. This confound also makes it difficult to consistently measure human ability in wavelength discrimination by asking observers to distinguish two input colors while matching their brightness. We argue that the best experimental method for reliable measurement of discrimination thresholds is the one of Pokorny and Smith, in which observers only need to distinguish two inputs, regardless of whether they differ in hue or brightness. We mathematically formulate wavelength discrimination under this wavelength-intensity confound and show a good agreement between our theoretical prediction and the behavioral data. Our analysis explains why the discrimination threshold varies with the input wavelength, and shows how sensitively the threshold depends on the relative densities of the three types of cones in the retina (and in particular predict discriminations in dichromats). Our mathematical formulation and solution can be applied to general problems of sensory discrimination when there is a perceptual confound from other sensory feature dimensions.

May, K.A. & Zhaoping, L. (2009). Effects of surrounding frame on visual search for vertical or tilted bars. Journal of Vision, 9(13):20, 1–19.

It is easier to find a tilted bar amongst vertical bars than vice-versa, but this asymmetry can be abolished or reversed by surrounding the bars with a tilted frame. The frame effect is important because it challenges bottom-up models of saliency. We conducted two experiments to investigate the causes of this effect. In Experiment 1, we removed different components of a square frame, and concluded that the frame effect was caused by a combination of (1) high-level configural cues that provided a frame of reference, and (2) bottom-up iso-orientation competition from the sides of the frame parallel to the bars. The iso-orientation competition could have arisen from (1) diversion of attention to the parts of the frame parallel to the target, or (2) iso-orientation suppression between nearby units selective for the same orientation. Experiment 2 investigated the nature of the iso-orientation competition process. In this experiment, we used a single line (the "axis") embedded in a circular field of bar elements, rather than a square frame surrounding them. The effect of the axis declined rapidly to zero with increasing target-axis distance, suggesting that the iso-orientation competition was caused entirely by iso-orientation suppression between nearby units tuned to the same orientation.

View equipment specs

May, K.A. & Hess, R.F. (2008). Effects of element separation and carrier wavelength on detection of snakes and ladders: Implications for models of contour integration. Journal of Vision, 8(13):4, 1–23.  Download Quicktime Movie (4.5 MB)

In this paper, we examine the mechanisms underlying the perceptual integration of two types of contour: snakes (composed of Gabor elements parallel to the path of the contour) and ladders (with elements perpendicular to the path). We varied the element separation and carrier wavelength. Increasing the element separation impaired detection of snakes but did not affect ladders; at high separations, snakes and ladders were closely matched in difficulty. One subject showed no effect of carrier wavelength, and the other showed a decline in performance as the wavelength increased. We discuss how these results might be accommodated by association field models. We also present a new model in which the linkage results from overlap in the filter responses to adjacent elements. We show that, if 1st-order filters are used, the model's performance on widely spaced snake contours deteriorates greatly as the carrier wavelength of the elements decreases, in contrast to our psychophysical results. To integrate widely spaced contours with short carrier wavelengths, the model requires a 2nd-order process, in which a nonlinearity intervenes between small-scale 1st-stage filters and large-scale 2nd-stage filters. This model detects snakes when the 1st and 2nd stage filters have the same orientation, and detects ladders when they are orthogonal.

View equipment specs

Hess, R.F., Baker, D.H., May, K.A. & Wang, J. (2008). On the decline of 1st and 2nd order sensitivity with eccentricity. Journal of Vision, 8(1):19, 1–12.

We studied the relationship between the decline in sensitivity that occurs with eccentricity for stimuli of different spatial scale defined by either luminance (LM) or contrast (CM) modulation. We show that the detectability of CM stimuli declines with eccentricity in a spatial frequency-dependent manner, and that the rate of sensitivity decline for CM stimuli is roughly that expected from their 1st order carriers, except, possibly, at finer scales. Using an equivalent noise paradigm, we investigated the possible reasons for why the foveal sensitivity for detecting LM and CM stimuli differs as well as the reason why the detectability of 1st order stimuli declines with eccentricity. We show the former can be modeled by an increase in internal noise whereas the latter involves both an increase in internal noise and a loss of efficiency. To encompass both the threshold and suprathreshold transfer properties of peripheral vision, we propose a model in terms of the contrast gain of the underlying mechanisms.

View equipment specs

May, K.A. & Hess, R.F. (2007). Ladder contours are undetectable in the periphery: A crowding effect? Journal of Vision, 7(13):9, 1–15.

We studied the perceptual integration of contours consisting of Gabor elements positioned along a smooth path, embedded among distractor elements. Contour elements either formed tangents to the path (“snakes”) or were perpendicular to it (“ladders”). Perfectly straight snakes and ladders were easily detected in the fovea but, at an eccentricity of 6°, only the snakes were detectable. The disproportionate impairment of peripheral ladder detection remained when we brought foveal performance away from ceiling by jittering the orientations of the elements. We propose that the failure to detect peripheral ladders is a form of crowding, the phenomenon observed when identification of peripherally located letters is disrupted by flanking letters. D. G. Pelli, M. Palomares, and N. J. Majaj (2004) outlined a model in which simple feature detectors are followed by integration fields, which are involved in tasks, such as letter identification, that require the outputs of several detectors. They proposed that crowding occurs because small integration fields are absent from the periphery, leading to inappropriate feature integration by large peripheral integration fields. We argue that the “association field,” which has been proposed to mediate contour integration (D. J. Field, A. Hayes, & R. F. Hess, 1993), is a type of integration field. Our data are explained by an elaboration of Pelli et al.'s model, in which weak ladder integration competes with strong snake integration. In the fovea, the association fields were small, and the model integrated snakes and ladders with little interference. In the periphery, the association fields were large, and integration of ladders was severely disrupted by interference from spurious snake contours. In contrast, the model easily detected snake contours in the periphery. In a further demonstration of the possible link between contour integration and crowding, we ran our contour integration model on groups of three-letter stimuli made from short line segments. Our model showed several key properties of crowding: The critical spacing for crowding to occur was independent of the size of the target letter, scaled with eccentricity, and was greater on the peripheral side of the target.

View equipment specs

May, K.A. & Hess, R.F. (2007). Dynamics of snakes and ladders. Journal of Vision, 7(12):13, 1–9.

D. J. Field, A. Hayes, and R. F. Hess (1993) introduced two types of stimulus to study the perceptual integration of contours. Both types of stimulus consist of a smooth path of spatially separate elements, embedded in a field of randomly oriented elements. In one type of stimulus (“snakes”), the elements form tangents to the path of the contour; in the other type (“ladders”), the elements are orthogonal to the path. Little is currently known about the relative integration speeds of these two types of contour. We investigated this issue by temporally modulating the orientations of the contour elements. Our results suggest that snakes and ladders are integrated at similar speeds.

View equipment specs

Georgeson, M.A., May, K.A., Freeman, T.C.A. & Hesse, G.S. (2007). From filters to features: Scale–space analysis of edge and blur coding in human vision. Journal of Vision, 7(13):7, 1–21.

To make vision possible, the visual nervous system must represent the most informative features in the light pattern captured by the eye. Here we use Gaussian scale–space theory to derive a multiscale model for edge analysis and we test it in perceptual experiments. At all scales there are two stages of spatial filtering. An odd-symmetric, Gaussian first derivative filter provides the input to a Gaussian second derivative filter. Crucially, the output at each stage is half-wave rectified before feeding forward to the next. This creates nonlinear channels selectively responsive to one edge polarity while suppressing spurious or “phantom” edges. The two stages have properties analogous to simple and complex cells in the visual cortex. Edges are found as peaks in a scale–space response map that is the output of the second stage. The position and scale of the peak response identify the location and blur of the edge. The model predicts remarkably accurately our results on human perception of edge location and blur for a wide range of luminance profiles, including the surprising finding that blurred edges look sharper when their length is made shorter. The model enhances our understanding of early vision by integrating computational, physiological, and psychophysical approaches.

May, K.A. & Georgeson, M.A. (2007). Added luminance ramp alters perceived edge blur and contrast: A critical test for derivative-based models of edge coding. Vision Research, 47(13), 1721–1731.  Download PDF (305 KB)  Supplementary files (4 KB)

In many models of edge analysis in biological vision, the initial stage is a linear 2nd derivative operation. Such models predict that adding a linear luminance ramp to an edge will have no effect on the edge’s appearance, since the ramp has no effect on the 2nd derivative. Our experiments did not support this prediction: adding a negative-going ramp to a positive-going edge (or vice-versa) greatly reduced the perceived blur and contrast of the edge. The effects on a fairly sharp edge were accurately predicted by a nonlinear multi-scale model of edge processing [Georgeson, M. A., May, K. A., Freeman, T. C. A., & Hesse, G. S. (in press). From filters to features: Scale-space analysis of edge and blur coding in human vision. Journal of Vision], in which a half-wave rectifier comes after the 1st derivative filter. But we also found that the ramp affected perceived blur more profoundly when the edge blur was large, and this greater effect was not predicted by the existing model. The model’s fit to these data was much improved when the simple half-wave rectifier was replaced by a threshold-like transducer [May, K. A. & Georgeson, M. A. (2007). Blurred edges look faint, and faint edges look sharp: The effect of a gradient threshold in a multi-scale edge coding model. Vision Research, 47, 1705–1720.]. This modified model correctly predicted that the interaction between ramp gradient and edge scale would be much larger for blur perception than for contrast perception. In our model, the ramp narrows an internal representation of the gradient profile, leading to a reduction in perceived blur. This in turn reduces perceived contrast because estimated blur plays a role in the model’s estimation of contrast. Interestingly, the model predicts that analogous effects should occur when the width of the window containing the edge is made narrower. This has already been confirmed for blur perception; here, we further support the model by showing a similar effect for contrast perception.

View equipment specs

May, K.A. & Georgeson, M.A. (2007). Blurred edges look faint, and faint edges look sharp: The effect of a gradient threshold in a multi-scale edge coding model. Vision Research, 47(13), 1705–1720.  Download PDF (352 KB)  Supplementary files (3 KB)

A multi-scale model of edge coding based on normalized Gaussian derivative filters successfully predicts perceived scale (blur) for a wide variety of edge profiles [Georgeson, M. A., May, K. A., Freeman, T. C. A., & Hesse, G. S. (in press). From filters to features: Scale-space analysis of edge and blur coding in human vision. Journal of Vision]. Our model spatially differentiates the luminance profile, half-wave rectifies the 1st derivative, and then differentiates twice more, to give the 3rd derivative of all regions with a positive gradient. This process is implemented by a set of Gaussian derivative filters with a range of scales. Peaks in the inverted normalized 3rd derivative across space and scale indicate the positions and scales of the edges. The edge contrast can be estimated from the height of the peak. The model provides a veridical estimate of the scale and contrast of edges that have a Gaussian integral profile. Therefore, since scale and contrast are independent stimulus parameters, the model predicts that the perceived value of either of these parameters should be unaffected by changes in the other. This prediction was found to be incorrect: reducing the contrast of an edge made it look sharper, and increasing its scale led to a decrease in the perceived contrast. Our model can account for these effects when the simple half-wave rectifier after the 1st derivative is replaced by a smoothed threshold function described by two parameters. For each subject, one pair of parameters provided a satisfactory fit to the data from all the experiments presented here and in the accompanying paper [May, K. A. & Georgeson, M. A. (2007). Added luminance ramp alters perceived edge blur and contrast: A critical test for derivative-based models of edge coding. Vision Research, 47, 1721–1731]. Thus, when we allow for the visual system’s insensitivity to very shallow luminance gradients, our multi-scale model can be extended to edge coding over a wide range of contrasts and blurs.

View equipment specs

Zhaoping, L. & May, K.A. (2007). Psychophysical tests of the hypothesis of a bottom-up saliency map in primary visual cortex. PLoS Computational Biology, 3(4), e62.  Download PDF (2.3 MB) Note: This pdf file has higher-quality images than the one that is now available on the PLoS Computational Biology website. They used to provide high and low quality versions, but now only provide low-quality versions.

A unique vertical bar among horizontal bars is salient and pops out perceptually. Physiological data have suggested that mechanisms in the primary visual cortex (V1) contribute to the high saliency of such a unique basic feature, but indicated little regarding whether V1 plays an essential or peripheral role in input-driven or bottom-up saliency. Meanwhile, a biologically based V1 model has suggested that V1 mechanisms can also explain bottom-up saliencies beyond the pop-out of basic features, such as the low saliency of a unique conjunction feature such as a red vertical bar among red horizontal and green vertical bars, under the hypothesis that the bottom-up saliency at any location is signaled by the activity of the most active cell responding to it regardless of the cell's preferred features such as color and orientation. The model can account for phenomena such as the difficulties in conjunction feature search, asymmetries in visual search, and how background irregularities affect ease of search. In this paper, we report nontrivial predictions from the V1 saliency hypothesis, and their psychophysical tests and confirmations. The prediction that most clearly distinguishes the V1 saliency hypothesis from other models is that task-irrelevant features could interfere in visual search or segmentation tasks which rely significantly on bottom-up saliency. For instance, irrelevant colors can interfere in an orientation-based task, and the presence of horizontal and vertical bars can impair performance in a task based on oblique bars. Furthermore, properties of the intracortical interactions and neural selectivities in V1 predict specific emergent phenomena associated with visual grouping. Our findings support the idea that a bottom-up saliency map can be at a lower visual area than traditionally expected, with implications for top-down selection mechanisms.

View equipment specs

Perrett, D.I., May, K.A. & Yoshikawa, S. (1994). Facial shape and judgements of female attractiveness. Nature, 368, 239–242.
 Download PDF (501 KB)

THE finding that photographic14 and digital5 composites (blends) of faces are considered to be attractive has led to the claim that attractiveness is averageness5. This would encourage stabilizing selection, favouring phenotypes with an average facial structure5. The 'averageness hypothesis' would account for the low distinctiveness of attractive faces6 but is difficult to reconcile with the finding that some facial measurements correlate with attractiveness7,8. An average face shape is attractive but may not be optimally attractive9. Human preferences may exert directional selection pressures, as with the phenomena of optimal outbreeding and sexual selection for extreme characteristics1014. Using composite faces, we show here that, contrary to the averageness hypothesis, the mean shape of a set of attractive faces is preferred to the mean shape of the sample from which the faces were selected. In addition, attractive composites can be made more attractive by exaggerating the shape differences from the sample mean. Japanese and Caucasian observers showed the same direction of preferences for the same facial composites, suggesting that aesthetic judgements of face shape are similar across different cultural backgrounds. Our finding that highly attractive facial configurations are not average shows that preferences could exert a directional selection pressure on the evolution of human face shape.

View press coverage

Thompson, P., May, K. & Stone, R. (1993). Chromostereopsis: A multicomponent depth effect? Displays, 14, 227–234.  Download PDF (1.4 MB)
This paper is also avaliable in CBZ format, in both high quality (10.8 MB) and medium quality (3.6 MB). Note: CBZ files are actually zip files containing the scanned images. You can change the file extension to '.zip' and unzip them as normal to get the image files. But if you keep them as CBZ files, you can read them with the free CDisplay comic book reader, which is a lot less sluggish than Acrobat Reader when viewing scanned documents.

Colours on a flat two-dimensional surface can appear to lie in different depth planes. This phenomenon, readily seen on a computer monitor, is called chromostereopsis. Typically, red objects appear closer to the observer than blue objects. Although research on chromostereopsis has a history of over one hundred years, there are still aspects of it that are not fully explained. The simplest (and earliest) explanation proposes that a combination of chromatic aberration and the displacement of the fovea from the eye's optical axis is responsible for the illusion. Recent research supports the notion that other factors need to be taken into account, for example the eccentric location of the pupils and the Stiles-Crawford effect. We describe some of our own research that suggests that in many displays at least part of any perceived depth is due to luminance differences, bright objects appearing closer than dim ones.

Copyright notice

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.