Hybrid image of three contents

A hybrid image allows multiple image interpretations to be modulated by the viewing distance. Originally, it can be constructed by combining the low and high spatial frequencies of two different images. The original hybrid image synthesis was limited to similar shapes of source images that were aligned in the edges, e.g., faces with a different expression, to produce an effective double image interpretation. In our previous work, we proposed a noise-inserted method for synthesizing a hybrid image from dissimilar shape images or unaligned images. In this work, we propose a novel method for adding an image to be seen from a middle viewing distance. The middle-frequency (MF) image is extracted by a special bandpass filter, which generates ringing while extracting only specified frequency bands. With this method, the middle frequency should be perceived as a meaningless pattern when viewed from a far distance and close up. A parameter tuning experiment was performed to determine the suitable cutoff frequencies for designing the filter for the MF image. We found that ringings of a suitable size could be used to make the middle frequency less noticeable when seen from far away.


Introduction
A hybrid image was introduced by Oliva et al. [1] in 2006. It allows a new paradigm in which a single image can be alternatively interpreted as two different kinds of information, with the modulation of the viewing distance. It can be considered as an ambiguous image. An ambiguous image or a double image is a kind of optical illusion, which is created not only for art but also as an experiment stimulus in the field of psychology [2][3][4], and for studies on the scene perception in the human brain [5,6]. A traditional famous optical illusion of a duck/rabbit ambiguous image [7,8] was initially used by psychologists to point out that visual perception relates to mental activity [9]. Another well-known example of an ambiguous image is the painting of the Holy Roman Emperor Rudolph II as Vertumnus by Giuseppe Arcimboldo. Giuseppe Arcimboldo was known for creating imaginative portrait heads by agriculture products such as fruits or vegetables. For the portrait of Holy Roman Emperor Rudolph II, he arranged images of seasonal vegetables as local information in such a way that the whole collection of vegetables formed a face and body resembling the Roman god of plant life, i.e., the global information of the image. From these kinds of figures, we interpret the global content by integrating local information based on perceptual grouping.
Meanwhile, a hybrid image requires the modulation of a degree of visual angle to see other information that is hidden in the image [10]. The degree of visual angle is relative to the actual viewing distance. Originally, a hybrid image was developed as an experimental stimulus to study the human visual system in terms of spatial frequency [11]. A well-known hybrid image is an image that combines portraits of Einstein and Monroe. The same image can be seen as Einstein when viewed from a close distance, or as Monroe when viewed from a few meters away. It is also possible to demonstrate the changing degree of visual angle from big to small by changing the image's size instead of changing the viewing distance. In addition, it may be possible to see such a hidden image from far away, with both eyes squinted [12], or through a mobile phone's camera.
According to multiscale perceptual mechanisms of the human visual system, it is possible to present particular spatial frequency information in the image at a certain viewing distance. A hybrid image, I HB , can be synthesized with two input images, I 1 and I 2 , based on this idea [1], where L p is a lowpass filter and H p is a highpass filter. According to the contrast sensitivity function [13], a human observer can discriminate a sine-wave grating of frequency g, which is 4 to 6 cycles per degree (CPD) of visual angle, at the lowest contrast. The cutoff frequency C cycles per image used to design the high and lowpass filters, is determined according to the viewing angle θ such that C = θg. Here, θ is the degree of visual angle per image, and it is calculated as where h is the image height, and d is the distance from the viewer to the image.
The roles of different spatial frequency bands were examined using hybrid visual stimuli, i.e., hybrid images, by Oliva et al. in refs. [14,15]. In both works, they discovered that most participants-when presented with visual stimuli for a short time-were oblivious to the fact that they viewed the same image that had two interpretations. In addition, the participants observed different spatial frequencies according to the experimental task [15]. It was assumed from these studies that when viewing a hybrid image, the visual system is often unaware of the other information hidden in an unattended frequency band. Hence, a conventional method for the composition of a hybrid image was introduced. However, the source images were well-aligned in those works, for example, faces with different expressions.
To create a compelling hybrid image, we need to calculate the cutoff frequencies for both spatial frequency images from the CPD, where the sensitivity peaks in the contrast sensitivity function. When two images of different shapes are hybridized with the original method, the unaligned parts cause an ambiguous perception of one image at a distance. Consequently, both images are often perceived at the same time, especially when viewed closely. The effect of this problem can be seen in refs. [11,16], where the experimenter used hybrid stimuli composed of unaligned images, i.e., different visual scenes. Brady and Oliva [16] found that the low-frequency (LF) information could be seen from almost all viewing distances when the hybrid of different visual scenes (for example, bedroom, forest, and living room) was used as stimulus, which was not the case when using properly aligned images like faces with different emotional expressions.
To create a hybrid image that does not rely on the overlap of the source image's global spatial scale, we need to maintain the separation of the perception of two spatial frequency images with regard to the viewing distance. The main underlying theory is contrast sensitivity.
Because human eyes have limited visual acuity depending on the viewing distance, the high-frequency (HF) image automatically falls off the visible area of the contrast sensitivity function; there should be little to no problem viewing the hybrid image from far away, even when the hybrid image is synthesized from source images that contain different shapes. However, when one looks at the hybrid image closely, the overlapping part of the LF image is visible alongside the HF image. Therefore, the main challenge when synthesizing this type of hybrid image is to maintain the separation of the spatial frequencies when the hybrid image is viewed up close.
Ideally, an edge-alignment-free hybrid image is a hybrid image in which the LF image is perceived as noise or is completely disregarded when viewed closely. To achieve this, we need some HF noises that make the LF image less noticeable but do not deteriorate the perception of the HF image. However, the most challenging point is that this contradicts the findings of critical band masking research. For example, Solomon and Pelli [17] tried to identify the role of the human visual system in the perception of letters and gratings. In their work, they superimposed various spatial frequency noises on the fixed size of a letter image. They found that the same frequency noise worsens the perception of an image, i.e., a letter.
Konishi and Yamaguchi [18] challenged this problem by processing the HF and LF images separately before composing the hybrid image. They introduced the use of noises in the HF image to cover parts of the LF image that were not aligned with the HF image, as well as the contrast reduction method in the LF image. For noises in the HF image, they used ringing artifacts as the byproduct of the high spatial frequency extraction with the two-level highpass filter. With this method, noises were produced in a nonrandom manner to prevent an ambiguous perception of the HF image. However, ringing produced by this method has low contrast, especially when the ringing is far from the edges in the HF image. To increase the contrast of ringing throughout the image, work in ref. [18] introduced "local contrast adjustment". In their local contrast adjustment, an image was first separated into small rectangular blocks. Then, the contrast of each block was enhanced by histogram equalization.
To synthesize an edge-alignment-free hybrid image, it is necessary to make the LF image less noticeable and, at the same time, the HF image more noticeable. We proposed two methods called "noise-inserted method" and "color-inserted method" in ref. [19]. The idea of using noise in a hybrid image came originally from the aforementioned work [18]. We successfully synthesized a hybrid image from unaligned source images and proved that our proposed method could achieve the best separation of the spatial frequencies, when compared with the previous methods by refs. [1,18] in the experiment [20].
In this work, we employ an adapted version of our previously proposed method to synthesize a hybrid image from three different images. The new kind of hybrid image can be interpreted differently from three different distances: far, middle, and near viewing distances.
To present one image at each distance, we use three different frequency filters, each designed to allow different frequency bands (the low, middle, and high) to pass. In this paper, we discuss mainly the method of extracting frequencies regarding the image seen from the middle distance; appropriate cutoff frequencies for synthesizing a hybrid image of three contents are also investigated.

Methods
Showing three different contents at three distances is a challenging problem that extended from the previous version of our proposed hybrid image. This time, we must consider the image to be seen at the middle distance, which should not be perceptible from up close and far away. The proposed outline is based on our previous work, with the addition of a middle image extracted by a new type of frequency filter. For thorough understanding, the frequency image to be seen from up close is named "HF image", the frequency image to be seen from the middle distance is named "middle-frequency (MF) image", and the frequency image to be seen from far away is named "LF image".
Similarly to our previous work, we began with the preprocessing of all the source images to achieve the appropriate contrast and details. Then, we extracted each frequency band with different frequency filters in the frequency domain. Finally, we performed local histogram equalization using each frequency image's local frequency map. The overall process is illustrated in Fig. 1. Source images were taken from refs [21,22].

Preprocessing
Different types of preprocessing are performed on each source's image depending on the distance to be perceived. For instance, we perform Gradient Domain image range Compression (GDC) [23] on the LF source image, I LF , to reduce its dynamic range. This reduces the overall contrast so the final image does not stand out too much when synthesized.
For the HF source image, I HF , we perform detail enhancement (DE) [24] to enhance existing noises that are difficult to perceive with bare eyes, like digital noises or ISO noises. This way, the HF image extracted from the source image will contain many details. We can cover the presence of the LF image and the MF image by the enhanced details of the HF image.
Because the MF image is inserted between the HF and the LF image, we preprocess the image using the methods from both the HF and the LF images. The source image for the MF image, I MF , is firstly preprocessed using DE, and then, the overall dynamic range is compressed using GDC.

Extraction of frequencies
The extraction of frequencies on all three images is performed in the frequency domain. We use a two-level highpass filter to extract the high frequency from the detail-enhanced high frequency's source image. The two-level highpass filter will create ringing noises along with the extraction of HF information. For the LF image, we use a Gaussian lowpass filter to extract the low frequency from the source image that has a reduced dynamic range by GDC. For the extraction of the MF image, we propose a special filter that is designed as seen in Fig. 2. The magnitude can be written as where D is the distance from the center of the filter (or zero-frequency point), and D M1 , D M2 , D M3 are the filter cutoff values.

Local contrast enhancement
In our previous work [19], we relied on ringing generated from the HF extraction as additional noises to cover the LF part when viewed from near. However, ringing obtained by the two-level highpass filter had low contrast and gradually faded off as the distance to the original edges in the image increased. Therefore, we proposed local histogram equalization according to the location of the frequency information of the image to be paired. The map that indicates the location of the frequency information is called "local frequency map". In this work, we retain the use of a local frequency map for enhancing the contrast of the HF image, HF map M HF (p), using the same technique as proposed in ref. [19]. We also propose the local frequency map for the MF image as follows.

Local frequency map for MF image
The local frequency map for the middle frequency image determines the location of high frequency in the middle frequency image (MF map), and the location of relatively HF information in the LF image (LF map).
To know the prospective location of a particular range of frequencies in an image, it is necessary to isolate only the selected frequency band. We calculate the MF map by applying a bandpass filter to GDC (DE(I MF )), and the LF map by applying a bandpass filter to GDC(I LF ). Despite the name Gaussian band pass filter, the filter shape is similar to Fig. 2 Both maps are obtained by calculating the power spectral density of the specific frequency band from the bandpass-filtered image. Finally, we perform smoothing on both maps by Gaussian filter to avoid a zero-crossing position and to compute the local average. The smoothing parameters (σ) are 7 for HF map and MF map, and for LF map it is calculated from 1.2d LF with LF cycle = 6, where d LF is the distance for presenting LF image.
The final local frequency map for the MF image is obtained by map blending as follows: Here, l(p) and m(p) are the pixel values of the LF map and MF map accordingly. k L and k U give the lower and upper bound of the local frequency map k L , k U ∈ [0, 1] when m(p) is zero (provided that l(p) ∈ [0, 1] and m(p) ∈ [0, 1]).

Local histogram equalization
A histogram-equalized image E f (p) at position p of a filtered image G f (p), f ∈ {HF, MF} is obtained by the following expression: Here, T w is a transformation function of histogram equalization within a window w around the pixel p, and c(p) represents a contrast defined by the map value M f (p) as follows: where c min and c max are user-defined values standing for the minimum and maximum contrasts. In this work, we define the same value of c min and c max for both the MF and HF local histogram equalization.

Alpha compositing
The final hybrid image is obtained by combining the LF image, HF image, and MF image using alpha composition. In this work, we define the opacity values as 0.35, 0.35, and 0.3 for the HF image, MF image, and LF image, respectively. Figures 3 and 4 show the results of our proposed algorithm. The figure on an A4-size paper was calculated to be seen from three distances. The LF image was calculated to be seen from a longer distance (about 500 cm, equivalent to 1.71°of visual angle), the MF image from a middle distance (around 200 cm, equivalent to 4.29°of visual angle) and the HF image from a shorter distance (less than 30 cm, equivalent to 26.6°of visual angle). To generate the MF image, the cutoff frequency parameters for designing the special bandpass filter, D M1 , D M2 , and D M3 , were 40, 56, and 120 for Fig. 3 and 40, 60, and 120 for Fig. 4, respectively. All of the input images' sizes were 2560 × 1920 pixels.

Results
To design the special bandpass filter for the MF image, we needed to determine suitable cutoff frequencies. In this section, we explored a range of cutoff frequencies using the same set of source images, as shown in Fig. 1.
The LF image's cutoff frequency was fixed at σ = 16 pixels for the design of a Gaussian lowpass filter. The HF image's cutoff frequency was fixed at 120 pixels for the design of a two-level highpass filter. The image's size was 2560 × 1920 pixels. Therefore, the image was calculated to be viewed from a distance of less than 30 cm and more than 500 cm, displayed on a monitor size less than an A4 paper.  For the MF image's cutoff frequencies, we divided the parameter exploration into two phases. The first phase was to test with the range of D M1 and D M3 as wide as possible, and the variability of D M2 determined by ratio, r. Therefore, D M2 could be calculated using the following equation: We tested the following ranges for the MF filter: From visual inspection of all generated hybrid images, we found that the lower D M1 was effective in making the MF image the more noticeable at a middle distance, while it was still less-noticeable when viewed from up close and far away if appropriate ringings were generated. The suitable values of D M1 were found to be related to the value of σ, which determines the cutoff frequency for the LF image. That is, D M1 should be between 2σ and 3σ.
According to the filter design in Fig. 2, D M1 determines the location of a sharp cutoff frequency that generates ringing for the MF image, while the location of D M2 indicates the size of the bandpass filter. Meanwhile, D M3 determines the lower base of the slope. We found that alternating D M3 resulted in little or no observable difference in the first experiment.
Therefore, we eliminated the parameters in the second experiment by fixing the value of D M3 and alternating the D M2 value using r. The result is shown in Table 1 with expressions and their meanings described in Table 2. From the table, we found that a D M1 of around 40 to 48 pixels with r between 0 and 0.3 generates a promising result. The MF image appeared noticeable in the middle distance; meanwhile, the viewer's perception switched to the LF image when stepped away. At a closer distance, the HF image could be perceived, while the MF image appeared as a meaningless pattern.

Discussion
From the parameter tuning experiment, we found that the most critical parameter in controlling the noticeability of the MF image is D M1 , which is the cutoff frequency that affects the size of ringing. If the generated ringing size is too coarse (fewer CPD of visual angle), the MF image will be noticeable even when the viewer steps away from the hybrid image. Meanwhile, the   ringing with too much detail (more CPD of visual angle) will result in difficulty perceiving the MF image from the middle distance. In this manner, it is possible to achieve the separation of the spatial frequencies by manipulating the value of D M1 . It could be assumed that the suitable ringing size is somehow related to CPD of the peak sensitivity in Campbell's contrast sensitivity function [13].
In the meantime, we noticed that when the viewer was closer to the image, the MF image could be perceived as a meaningless pattern if the edges were not continuous. Fig. 5a shows the example of edge continuity, while Fig.  5b shows edge discontinuity. These phenomena may happen owing to some parameters, like D M1 , D M2 , and D M3 . However, we have not figured out which parameters cause this edge continuity yet. Further investigation should be done to determine suitable parameters, including r.

Conclusions
In this paper, we employ an adapted version of our previously proposed noise-inserted method to synthesize a hybrid image [19] from three different images. The new kind of hybrid image can be interpreted differently from three different distances; far, middle, and near viewing distances. To present one image at each distance, we use three different frequency filters, each designed to allow different frequency bands (the low, middle, and high) to pass. We propose a special bandpass filter (MF filter) for extracting frequencies to be seen from the middle distance. To determine the suitable cutoff frequencies for designing the MF filter, we conducted a parameter tuning experiment. As a result, we found that a suitable parameter for D M1 is linked to the σ for the LF filter. Meanwhile, the determination of suitable values for other parameters requires further investigation. In the future, we plan to conduct an experiment to measure the separation of the spatial frequencies when viewing the hybrid image from three different distances.