SE-402 RESEARCH REPORT
1
Experiments in Single-Sensor Acoustic Localization Seth Pollen and Nathan Radtke
[email protected],
[email protected]
F
Abstract—This research project explores the area of tangible acoustic interfaces as an alternative to traditional touch screens. Past research has shown the feasibility of single sensor acoustic localization. The weakness of existing solutions is the high effort required for calibration. In this paper, we present our attempts to create a low cost, single sensor acoustic localization solution, which employs algorithms intended to reduce the user effort in calibration. We provide the results of our implementation and present future research opportunities for the area of tangible acoustic interfaces. Index Terms—Tangible acoustic interfaces, touch screens, acoustics, neural networks, signal processing
1
P ROBLEM S TATEMENT
Touch interfaces to computer devices have been the focus of significant innovation in recent years, but mainstream touch screen technologies remain expensive and do not scale easily to large sizes or complex shapes. Thus, while touch surfaces could find numerous applications in the context of everyday human life, they are, for the most part, currently limited to traditional computing devices such as cell phones and PCs. One technology which promises to make touch surfaces more ubiquitous and economical relies on acoustic sensors to detect impacts (such as taps from a fingertip or solid object) on a surface by sensing the vibrations caused by those impacts. This technology, known as tangible acoustic interfaces (TAI), is much simpler, cheaper, and more power-efficient than existing technologies, and it scales very well to large and complex (non-planar) surfaces. This opens up the possibility of making nearly any solid object into a responsive touch interface [1]. The goal of our project is, first, to explore TAI solutions using a single, inexpensive sensor and, second, to reduce the calibration effort required for such systems without reducing the system’s touch input resolution. This improvement will require the implementation of some kind of interpolation which allows the system to more precisely localize taps not coincident with given calibration points. To reduce the cost of our solution, we also hope to be able to use an ordinary laptop audio line-in to digitize the incoming signal, allowing Seth Pollen and Nathan Radtke are undergraduate software engineering students at the Milwaukee School of Engineering in Milwaukee, Wisconsin.
the rest of signal processing to take place in software. Finally, we want to improve usability of the system when a projector display is overlaid onto the touch surface, allowing calibration to take place interactively.
2
P REVIOUS R ESEARCH
Significant research into tangible acoustic interfaces took place in 2005-6 as a part of the TAI-CHI project (www.taichi.cf.ac.uk). This project identified three major algorithmic approaches to the design of TAIs: time delay of arrival (TDOA), time reversal (TR), and acoustic holography [2]. Of these three, the time reversal approach has the special advantage of operating with as little as one acoustic sensor. A single-sensor solution has many advantages. The sensor can be positioned almost anywhere on the physical object being touched, eliminating the potential for human error in positioning multiple sensors relative to one another. Furthermore, a single sensor permits applications to use existing sound input hardware (available on most personal computers) to receive the incoming signal, allowing all signal processing to be done in software [3]. The time reversal technique relies on the assumption that impacts at different locations on the sensitive object will produce unique acoustic patterns in the object (through reverberations off the object’s boundaries as well as different wave propagation modes), which can be received by a single sensor and distinguished from one another in software. Calibration of a time-reversal system, which involves tapping various points on the surface to provide an initial data set to the time reversal algorithm, allows the system to store the particular acoustic characteristics of the medium and sensor placement. However, the research done on the TAI-CHI project left some problems unsolved. Most importantly, their time-reversal solutions are entirely discrete. Extant literature does not seem to provide methods for interpolating between calibration points, meaning a large number of calibration points is required to achieve a resolution that appears continuous to users. The literature also seems to lack significant inquiry into methods for distinguishing valid impacts on the sensitive area from extraneous noise or impacts to other areas on the object not intended
2
SE-402 RESEARCH REPORT
to be sensitive. In this paper, we discuss our efforts to integrate interpolation and noise exclusion into existing single-sensor acoustic localization techniques.
3
Assumptions and Theory
For this algorithm, the sensitive object, or medium, is assumed to be an ideal acoustic conductor with a rectangular shape and smooth, straight edges. Under these assumptions, a burst of sound produced by an impact propagates through the medium without being altered and reverberates off the edges of the medium until being fully attenuated. A sensor attached to the medium receives, in addition to the original sound pulse, a series of echoes of that pulse, corresponding to all the linear reflected paths between the impact point and the sensor. Part of the four-point algorithm reduces each received tap to a series of timestamps describing the delays between successive echoes. Assuming a constant speed of sound in the medium, these timestamps can be used to reason about the lengths of the linear paths followed by the sound waves, and from that reasoning the location of the impact may be derived. In addition to assumptions about the acoustic properties of the medium, this approach also assumes the impact itself to consist of a brief burst of sound followed by silence. The duration of this initial burst must be shorter than the time delay of echoes in the medium so that echoes do not overlap and obscure one another in the received signal. 3.2
h = 0.015
F OUR -P OINT A LGORITHM
The first software technique we investigated deviates from existing single-sensor algorithms by making significant assumptions about the acoustic medium. Based on those assumptions, it attempts to model the acoustic transfer function across the whole medium. This algorithm is named “four-point” because of our initial hopes that it would achieve acceptable localization across the entire sensitive area based only on calibration data from the four corners of that area. While we did not find success with this algorithm, its outcomes informed our further research and are discussed here. 3.1
following set of values for these parameters were found to be the most adaptable and robust:
Tap Identification
A threshold-detection algorithm is used to identify impacts in the signal coming in through the sound card. Each time the audio signal crosses a certain threshold h, a snapshot of the signal is taken with a number of samples L prior to the threshold-crossing sample and T samples following it; this snapshot is then sent to the next stage of processing. After one snapshot is created, new snapshots are not returned until a certain interval of P samples passes without any of them crossing the threshold. This ensures that an impact followed by a particularly long string of echoes is not interpreted as two separate impacts. Based on our experiments, the
L = 25 T = 575 P = 3000 With these settings, the software is able to correctly identify taps on a variety of surfaces (chalkboards and tabletops) and with a variety of tap devices (rubberized bolts, human knuckles, and chalk). Note that these settings are dependent on the capabilities of our sound card; they are designed for operation with an amplitude range of ±1.0 and a sample rate of 44.1 kHz. These settings for T and L mean that each window of data prepared for further processing is 600 samples (13.6 ms) in length, and the minimum spacing between impacts is 3,025 samples (68.6 ms). 3.3
Quantization
The next step in the four-point algorithm is the quantizer, which converts incoming sound waves into a discrete time series of echoes. Several signal-processing approaches to quantization were considered, including conversion to the frequency domain. We rejected frequency-based techniques, however, because the overlaying of echoes into a single signal affects the amplitude of the signal far more than its frequency. While several variations were tried, all of our approaches to the quantizer consisted roughly of two steps, illustrated in Figure 1. First, to smooth out the oscillations in the incoming sound wave, the signal is squared and filtered using a low-pass FIR filter with a cutoff around 640 Hz, and the square root is then taken to restore the signal to its original scale. Note that the frequency envelope of a normal chalkboard tap peaks around 3 kHz. In the figure, the red line shows the original sound data captured by our microphone, and the blue line shows the curve produced by this smoothing technique. This smoothed curve provides a continuous approximation of the amplitude of the sound wave. In the second step of quantization, a median filter is used to detect peaks in the smoothed curve. When it encounters a peak of the correct width, the median filter outputs a flat plateau which can easily be identified by examining the differences between successive samples. The positions of these detected peaks (illustrated as black boxes in the figure) yield the timestamp series of echoes. It is possible, however, for this algorithm to detect waveform features not actually caused by echoes in the acoustic medium. The quantizer detects any fluctuation of the correct duration in the smoothed curve, no matter how slight the amplitude. Therefore, another algorithm step was added: it filters out all detected peaks except for a constant number of peaks having the highest absolute
MAY 27, 2011
Fig. 1. Steps of echo quantization.
3
medium boundaries. If one such calibration tap is found, the appropriate proposed boundary for the medium is extended by a small amount, and the same process is run again. Hopefully, this method of successive approximations eventually converges on the correct medium size and terminates. Once calibration has determined the physical extent of the medium, a simple algorithm can use this information to localize incoming taps by simulating the propagation of an impact at various points in the medium and selecting the point whose simulation results most closely match the observed timestamp series. 3.5
amplitudes; it was assumed that higher-amplitude peaks were more likely to represent actual echoes. A further refinement to this was also tried, which multiplies each peak’s amplitude by a constant power (usually less than 1) of its time relative to the start of the window and then takes the highest peaks. This compensates for the fact that peaks naturally decrease in amplitude as time goes on, due to attenuation of the acoustic signal. 3.4
Geometric Calibration
While we spent significant time testing and tuning the quantizer, we were never able to move to validation of the four-point calibration algorithm, which uses the timestamp series produced by quantization to reason about the physical extent of the medium and the locations of impacts. Nevertheless, we did draft an initial solution to this problem. This proposal collects a set of calibration data, consisting of coordinate points on the touch area paired with the quantized time series produced when each point was tapped. This information, combined with the known position of the sensor, is used to derive the physical dimensions of the acoustic medium. To do this, the medium is assumed to be rectangular with edges parallel to the edges of the designated touch sensitive area (which may or may not occupy the entire physical medium). The calibration process begins by proposing that the medium does not extend past the edges of the sensitive area. The available calibration data set is then searched for a timestamp series that would not have been possible given its impact location, the sensor location, and the currently proposed
Problems with Four-Point
Although significant effort was spent tuning the fourpoint algorithm, it was eventually deemed unworkable. The biggest problem was the inconsistency of the quantizer. Running our quantizer on data recorded from different taps at the same point on the surface produced significantly different timestamp series. This instability in quantization was probably a result of our oversimplification of the acoustic problem. It is quite possible that the sound wave received by our sensor is the combination of several different wave propagation modes which differ from our assumed model of simple reflections [4]. In any case, unstable quantization proved disastrous for our geometric calibration technique. Because our calibration technique regards each calibration sample (that is, each series of timestamps) individually when negotiating the physical dimensions of the medium, any error in the quantizer output could cause the calibration algorithm to fail to converge or cause it to produce wildly incorrect estimates of the medium’s size. While several workarounds were proposed for these problems, a lack of time and understanding led us to abandon the four-point algorithm and pursue localization techniques which required fewer assumptions. It may be possible, however, to improve the four-point algorithm by requiring the user to provide additional data during calibration. We could, for example, prompt the user to measure the physical dimensions of the medium so that our system does not have to derive this information from sound samples using such an unstable algorithm. Our calibration algorithm could also be improved to consider multiple taps simultaneously during its reasoning about the size and characteristics of the medium; this would make it less sensitive to variances in individual calibration taps.
4 4.1
N EURAL N ET A LGORITHM Choosing a Neural Network
As we pursued the four point quantizer approach, we also researched the use of neural networks. Our desire was to calculate continuous output for localization. We discovered that neural networks may be a solution for this problem as they are commonly used to approximate complicated functions.
4
SE-402 RESEARCH REPORT
Our research efforts led us to further investigate neural networks which would use a supervised learning mechanism. This meant that we would require users to collect data which, in turn, would be used to train the neural network. The two neural network types which we thought fit for our problem were the feed-forward back-propagation network and the radial basis function network. Our goal in using the neural network was to be able to train the network and then use it as a black box for solving localization between calibration points. Our expected output was a coordinate pair corresponding to a location on the user’s screen. 4.2
Implementation Approaches
Our implementation efforts started by looking for a C# API that implemented the feed-forward backpropagation network. We looked for this specific network type as it has many common applications. Many open source neural network APIs were found, but Aforge.NET [5] and FANN [6] were found to have very good documentation and appeared to be suitable for our needs. Getting the neural network to work proved to be a difficult task. At first it was unable to train and reach our performance goal, consisting of a maximum allowed error in the neural network output. Initially, we concluded our performance goal error was too small and thus unattainable. We experimented by raising the tolerance of the performance goal but found that the root of the issue was that the neural network was unable to make training progress in the time we were allowing it to train. The amount of time we allowed training to run varied from several seconds to several minutes. To allow any greater time for training, we feel, would have negatively impacted the user experience. Additionally, given that each time the system is set up it may be positioned differently (based on sensor location, screen resolution, calibration density, etc.) we consider calibration data non-portable, therefore, calibrating for several minutes during each setup is unrealistic and could cause users to not use this system. Our lack of results with the neural network libraries led us to reevaluate the feasibility of using a neural network for tap localization. Through additional research, we determined that this was an appropriate use of a neural network so we continued to investigate their capabilities. We discovered MATLAB’s neural network toolbox and employed it extensively to test and prototype neural networks. 4.3
Testing and Prototyping Neural Networks
The testing and prototyping of Neural Networks became rigorous with the transition to MATLAB. Aided by custom functions and scripts, we were able to record data in wave files and process them directly in the MATLAB environment, which made the testing process more efficient and automated. We developed a standard process
to test neural network features by adjusting parameters in the network script and running a validation test to see how well the configuration performed. 4.4
Processing Theories
By the time of testing in MATLAB we had gained a more formal introduction to neural networks in an artificial intelligence course. With this knowledge, we began to discover some of the missteps taken during setup of the neural networks. This ultimately led to better and more thorough analysis. Moving to MATLAB, we were encouraged to experience quick training times relative to those we were expecting from the trials in C#. Typical training times were on the order of 60 seconds. There were some network configurations which took vastly longer and attained the timeout threshold of 10 minutes, however, this was observed in few cases. A recurring problem we had was overfitting the network to the training data. When the training goals were reduced, no improvement was observed. At this point, we also changed our goal with the neural networks. Instead of training the network to produce a coordinate pair when given a waveform, we pursued a network that could generate a waveform from a coordinate pair, thereby simulating the acoustics of the medium. We understood that this was backwards from normal neural network operation but we decided to investigate further. The idea was that a neural network would be able to generate additional calibration data to assist the time-reversal algorithm (see section 5) reducing the calibration effort required by the user. The generated calibration data would fill in the spaces and increase the resolution of the system. 4.5
Results
Overall, we experienced many mixed results with neural networks without gaining consistent improvement. Our best results running the neural network in the forward direction were with the radial basis function network. Training was very quick and the calibration points stayed consistent in testing. Points distant from the calibrated points generally performed poorly. The radial basis function network in the reverse direction yielded interesting results. We found that smoothed wave generation was possible. However, from the inconsistent results, we formed several suspicions. These suspicions were rooted in inconsistent data and evidence that our tapping device was the cause. Later we found that inconsistent tap device and data may not have been the sole cause for poor performance. We found in an experiment that calibration data with tighter spacing yielded better results. The calibration point spacing used for testing the neural networks was 2-3x greater than what we found to be optimal (see section 5.2). When we discovered this new information, however, we had already abandoned the neural network
MAY 27, 2011
5
solution to shift focus on progressing the project. Given time constraints we were unable to revisit the neural network solution.
5
T IME R EVERSAL A LGORITHM
In contrast to our four-point and neural net approaches, the time reversal algorithm has already been the subject of extensive research [7] and has even been proposed as part of a consumer product [8]. It makes only one assumption about the acoustic properties of the medium, namely, that impacts at different locations will produce distinct acoustic patterns. During calibration, a set of known points are tapped and the resulting sound waves are stored. Then, in order to localize a new tap, its sound wave is compared against the stored calibration waveforms using some matching function, and the calibration point whose waveform best matches the new tap is returned as the result of the localization algorithm. The name “time reversal” comes from the underlying theory that the waveform received by the sensor, if timereversed and re-emitted into the medium at that point, would reproduce the original impact waveform at the original impact location. 5.1
Choice of Matching Function
The first task in developing a time reversal solution is choosing the function to use for evaluating the match between two waveforms. Prompted by previous research, we evaluated several candidate functions, including the Pearson correlation coefficient [9] and a technique based on cross-correlation [7]. The first technique uses the traditional formula for Pearson’s r-value to measure the match between two vectors A and B of samples taken from the audio input hardware: n ¯ Bi − B 1 X Ai − A¯ r1 = n − 1 i=1 sA sB where Ai is the ith sample from vector A, A¯ is the mean of A, sA is the standard deviation of A, and similar definitions hold for B. The second matching technique uses the maximum value achieved by the cross-correlation of the two input vectors: ! ∞ X n r2 = max Ai Bi+t t=−n
i=−∞
Where Ai and Bi are defined to be 0 for all indices i ∈ / [1 n]. This technique effectively searches along the time axis for a shift value (t) that brings the two signals into the closest agreement. This is an advantage over the Pearson technique used for r1 , which has no tolerance for time-shifted signals. However, unlike the Pearson technique, this cross-correlation is influenced by the loudness of the signal as well as its shape. Thus, louder signals will tend to produce higher r2 values, even if they do not match well together.
Our final matching technique, which provides the best results, is based on cross-correlation but adds the statistical normalization used by Pearson’s technique to compensate for varying loudness in the input signals: ! ∞ X ¯ Ai − A¯ Bi+t − B 1 n max r3 = n − 1 t=−n i=−∞ sA sB The value produced by this formula will always lie in the interval [0 1], with 1 indicating a perfect match between the two vectors. Note that, because of the statistical normalization performed on the two vectors, this technique ignores their relative magnitudes. Matching functions were evaluated by running them on two waveforms produced by tapping the same point on a chalkboard and two waveforms produced by tapping different points. The function which best differentiated between taps at the same location and taps at different locations was considered the best. Our implementation of the time-reversal algorithm uses the same threshold-crossing algorithm as described in section 3.2 for detecting impacts. As noted in section 3.2, a window size of 600 samples (at a 44.1 kHz sample rate) is used to represent each impact. Larger window sizes (up to 1200 samples) were tried, but localization results did not improve significantly. Besides, increasing window sizes degrades runtime performance by requiring correlation to be computed over more samples. Various preprocessing functions were applied before computing the cross-correlation match for two waveforms, including the Fourier transform and a 640-Hz low-pass filter (to smooth out oscillations in the signal). None of these preprocessing functions improved correlation results, so they were eliminated. 5.2
Calibration
The time-reversal algorithm requires more calibration effort than would be necessary with more intelligent approaches, such as four-point. This is due to the fact that time reversal does not make any attempt to solve or model the acoustic properties of the medium; it must thus have calibration samples from points covering the entire sensitive area. All points in the designated sensitive area must be within a certain distance of a calibration point so that the time-reversal algorithm can use impact data gathered from calibration points to approximate the acoustic profile of all other possible impact locations. Thus, calibration points are usually arranged in a grid covering the area of interest. The most important aspect of the calibration grid is the physical spacing of these points on the sensitive area. Interestingly, it is possible to calculate an upper bound for this spacing from the acoustic properties of the medium, such as the wavelength of the waves produced by impacts; see [7]. We experimented with several different tap surfaces, including blackboards, solid wood tables, and composite
6
SE-402 RESEARCH REPORT
wood tables. We generally found an acceptable calibration point spacing to be between 3 and 6 inches; this ensures that our discrete localization algorithm (see section 5.4) always matches a tap on the sensitive area to one of the nearest calibration points. Wider spacing of calibration points tends to degrade both localization schemes (see sections 5.4 and 5.5). 5.2.1 Point Refinement To improve the integrity of calibration data, a technique called point refinement was adopted. During calibration, each calibration point is tapped three times, yielding three sample signals. These samples are then matched to one another using our chosen matching function. The sample with the highest correlations to the other two samples is retained in the final calibration data set, while the other two samples are discarded. This makes calibration more robust by enabling the system to identify and discard bad data, which could be caused by extraneous noise or by some other impact on the sensing surface not intended as a calibration tap. It also improves runtime performance of localization by reducing the number of correlations that must be computed against the calibration data set. 5.3
Tap Exclusion
Once we had chosen an appropriate matching function, we applied it to two separate problems: tap exclusion and tap localization. The first of these problems, tap exclusion, has received less attention from the research community to date. The goal here is to ignore ambient noise and impacts on areas not designated as touchsensitive. This was identified as an important feature if this technology were to be used in normal settings. Errant or unintended computer interaction could be prevented with the implementation of such filtering. We developed two levels of tap exclusion. The first level is to filter out input by monitoring the amplitude of sound directly from the input source; see section 3.2. The second level is correlation exclusion. The second level of exclusion filters out errant taps based on the tap’s correlation to the calibration data. A linear function was found to express the needed threshold for the incoming tap’s correlation to the stored library of calibration data. This function depends on the spacing of the calibration points. The trend found was that the wider the spacing between calibration points the lower the threshold must be. Therefore a larger spacing will permit more taps than a tighter, or smaller, spacing. 5.4
Discrete Localization
The goal of tap localization is to determine where an impact originated, based on the received signal. Two approaches to this problem were tried: one discrete and the other interpolated. The discrete solution is much simpler; once a correlation function is selected, input
Fig. 2. Discrete matching results.
signals can be matched against the grid of calibration samples paired with coordinate locations on the sensitive surface. The calibration point which best matches the received signal is considered to be the location of the impact. In our experiments with grids of properly spaced calibration points, taps not coinciding with a calibration point almost always matched to one of the four nearest calibration points, with occasional errors. To simplify program operation when high resolution is not required, an alternative discrete solution was implemented, which matches taps to regions on the screen rather than to exact points. Each region must be calibrated with a grid of points that properly cover it, as described above in section 5.2. Incoming taps are then matched to the center of the localization results produced by this technique. The grids of calibration points are not shown; the regions were calibrated on a classroom chalkboard with a 3-inch spacing between points. The calibration points did not coincide with the test points. The grid overlaid on the regions in the figure shows the test points that were tapped to verify proper operation of region-based discrete matching. As you can see, points falling within the regions are matched to the proper region with almost perfect accuracy. The grid vertices not colored indicate test points whose taps were excluded. The tap excluder performed poorly in the lower-right corner of the area shown, permitting several taps not in a designated sensitive region. This may be due to the fact that the sensor was attached to the medium in this area. Another factor be an improperly selected threshold for tap exclusion. Since creating this figure, we have implemented a tap exclusion scheme that dynamically selects this threshold based on the calibration point spacing, addressing part
MAY 27, 2011
Fig. 3. Example correlation surface, using a 3-inch mesh spacing.
7
to yield the coordinates of the output point. Performing this weighted average over the entire set of calibration points is not a good solution, however, because it will always bias the output points toward the center of the mesh; poorly correlated calibration points still have a non-zero correlation and will thus be factored into the average. Therefore, some method must be introduced for choosing a set of representative points from the grid so that only those points are averaged together. To further reduce the effect of poorly correlated points, another step is added: once the representative set of points is chosen, the lowest correlation value in the set is subtracted from the correlation value for each point, with values dropping below zero being clamped to zero. We tested this weighted-averaging technique using several algorithms for selecting the representative set of points. 5.5.1
of the tap exclusion problem illustrated here. 5.5
Interpolated Localization
With the time reversal algorithm, it may not be necessary to constrain localization to the discrete set of known calibration points. Investigating this possibility is the goal of interpolated localization. The four-point and neural net algorithms promise interpolated localization by solving the acoustic transfer function of the medium. Time reversal does not attempt to solve the transfer function, and therefore it must perform interpolated localization based solely on the correlation of the input tap with its set of calibration taps. If the calibration grid points are spaced closely enough, the correlation values for grid points provide sufficient sampling of a smooth correlation surface with its peak at the actual location of the impact [7]. Figure 2 shows an example of such a surface from one of our own experiments. The next step is to find the maximum point of the (presumably) smooth surface of which each calibration point provides a discrete sample. If the spacing of calibration points is too wide, peaks in the correlation surface will not be properly sampled by the calibration grid, which could cause localization results to be grossly inaccurate. One possible technique for interpolated localization would be to fit piecewise-smooth regression curves to the correlation values in each row and column of the calibrated grid. The maxima of these horizontal and vertical curves could then be calculated and combined to produce a location within the grid. We did not examine this technique, however, due to a lack of time. The technique we did investigate performs weighted averaging of calibration point coordinates in the x- and y-dimensions (using correlation values as the weights)
Selection by Cells
In this technique, we find the highest discrete point on the surface (that is, the best-correlated calibration point). From the four grid cells which have this point as a corner, we choose the cell with the highest average correlations at all of its corners. The four corners of this cell form the representative set which is used for weighted averaging. Selecting only a single cell, however, biases interpolation away from calibration points and towards the centers of cells, since the center of the representative set is always the center of a cell, and this is where the average of the corners’ coordinates tends to fall. In order to avoid this, the eight points surrounding the bestcorrelated point (i.e. the corners of all four adjacent cells) were taken as the representative set. This technique, however, biases interpolation towards calibration points, since the set of representative points is always centered on a calibration point. In summary, neither of the cellbased techniques provide consistently good interpolated localization. 5.5.2
Individual Selection
Alternative methods of selecting the representative set of points include choosing all points with correlations above a certain threshold and choosing the best n correlated points for some constant n. These methods have the advantage of not depending on a single, maximal discrete point for their geometric arrangement (as the cell-based approaches do). This allows the representative set to vary more freely as the characteristics of the correlation surface change. Also, under the threshold-based approach, the shift by which all correlations are reduced before averaging can be a constant (that is, the threshold itself) instead of being determined by a single point sampled from the representative set. This makes localization less dependent on any single datum, improving stability. In our experiments, these approaches (one based on a correlation threshold and the other on a constant n number of points) provided better interpolation results than the cell-based approaches.
8
Fig. 4. Results of average-threshold interpolation, with a power of 25.
There are problems with these techniques, however. The threshold approach has the disadvantage that it requires manual selection of the threshold value, which may need to be re-tuned if the mesh spacing, tap device, or tap surface change. The “constant-n” approach also has a disadvantage: it relies on a single point (the nthbest correlated calibration point) to provide the shift by which all the other correlation values are reduced before being averaged. Reliance on a single point for anything introduces unwanted variability into the system. To address these concerns, we tried one last interpolation algorithm. Instead of manually specifying the threshold for selecting the representative set of calibration points, we calculate it each time by taking the mean of the correlation values of all the calibration points. To bias this mean upwards, however, we raise all the correlations to a power p > 1, take the mean, and then raise the mean to the power p−1 . We experimented with various values for p on a 7-by-7 calibration point mesh, finding the best interpolation results around p = 25. With this value for p, each representative set contained an average of 3.1 points. Figure 4 shows a vector plot of the results of this interpolation algorithm with p = 25. Each green circle is the location of an actual impact, either at a calibration point or at the middle of a grid cell, and the corresponding blue arrow points to the location to which the interpolation algorithm mapped that impact. Ideally, then, the blue vectors should all have length zero. From the figure it can be seen that this interpolation technique works well in some areas but has a consistent bias in other areas. This bias, which during experimentation seemed to persist with repeated taps, may be due to inaccuracies in our calibration data set or to acoustic features of the underlying medium. In any case, this was the best interpolation we achieved.
SE-402 RESEARCH REPORT
5.6
Performance Concerns
Performing correlation of an input tap with all samples in the calibration set is a time-intensive task. Our goal was to keep the time taken by localization small enough so that users perceive their input as registering instantaneously. To achieve this goal, we performed several optimizations. The cross-correlation of signals (used by us to calculate the correlation between two taps) is traditionally computed across its entire domain using Fourier transforms. We, however, do not usually need to compute the crosscorrelation across its entire domain, since we are only interested in its maximum value, which usually occurs with a shift value near zero. Our edge detection algorithms ensure that sampled signals are at least somewhat aligned along the time axis, so the cross-correlation only needs to be computed over a small interval centered at zero in order to compensate for slight timing variations in the sampled signals. Thus, instead of taking the Fourier transform of both signals and then multiplying them together to find the cross-correlation, we calculate the cross-correlation over a window of ±120 samples (±2.7 ms at our 44.1 kHz sample rate) using a naive algorithm (that is, a simple sum of products for each possible offset): ! 120 X ¯ Ai − A¯ Bi+t − B 1 n 0 max r3 = n − 1 t=−n i=−120 sA sB Finally, since the time reversal algorithm requires several independent correlations to be calculated, it adapts well to parallel computing. Running on a dual-core machine, we were able to halve the running time by using parallel tasks to calculate correlation values for our whole set of calibration data. As an example, we ran our parallelized discrete localization algorithm on a calibration set containing 60 taps; localization took an average of 101 ms each time. This timing is good enough to provide a seamless user experience.
6 6.1
S ENSING H ARDWARE Sensor
The sensor we chose was the Knowles accelerometer which was used by TAICHI researchers [4]. The Knowles device, part number BU-21771, is a high sensitivity ceramic vibration transducer. This device was an attractive choice because it requires low voltage (we were able to power it by stepping down the voltage from a USB port) and the output voltage did not require amplification to be read by the sound card. Early in the project we pursued our goals using a standard desktop microphone as our sensor. Dr. Sverre Holm preceded our usage of this sensor type in his demonstration [3]. We discovered several weaknesses in the use of the microphone. The first weakness is that the microphone was highly susceptible to interference of ambient noise, whereas the same ambient noises did not
MAY 27, 2011
affect the Knowles vibration transducer. Additionally, we observed that the sound of the impact carried through the air has a significant influence on the desktop microphone. This is a particular problem because we want to monitor only the vibrations which propagate through the medium; the Knowles device performs far better in this respect. Attempts were made to modify the microphone to improve the performance, such as removing the casing and soundproofing it from with added padding. However, these techniques did little to improve the behavior. Ultimately, we identified the Knowles vibration transducer as the sensor appropriate for our project. The transducer’s cost was slightly higher than the desktop microphone, but this difference was not a deterrent for the performance gains. 6.2
Signal Capture
All results described in this paper were achieved using a normal laptop sound card to digitize the sensor signal; the hardware sample rate was 44.1 kHz. Extant literature supports the conclusion that this arrangement is sufficient for good localization [7], [10].
7
TAP D EVICE H ARDWARE
In our early experiments and tests, one theory we had about the observed poor results was that our tap device was producing an inconsistent tap when struck against the surface. Originally, we used chalk against a slate blackboard as demonstrated by Holm [3]. Our analysis and observations led us to believe that chalk, when hit against the chalkboard, can produce a different sound each time causing our localization techniques to produce incorrect results. The goal then became to find a device that would produce a consistent tap. The proposed device was a spring loaded punch with a repeatable mechanical action. Upon running the same tests as with the chalk, we were surprised to find similar results. The consistently tapping center punch did not improve our results. What these tests revealed to us was that the tap generating device was not the sole cause for poor algorithm performance. After that, our investigations into the tap device were shelved while we refined our matching algorithms. During our tests with interpolation, we rediscovered the inconsistencies of tapping with chalk to be an issue. This time, we tried tapping with a rubber coated bolt, which yielded much better results. We surmise that the rubber coating softens the impact, producing a smoother and more regular input to the surface’s acoustic transfer function. The results we observed as a result of changing to this tapping device were encouraging. In an experiment we observed behavior that showed the tap matching performance with the rubberized bolt was much improved over the chalk tests. Our last tap device test was to tap the medium with our fingers. Due to time constraints we were unable
9
to fully test finger tapping; however, in several trials we found that using a finger as the tap generator was possible. We have identified this as a future opportunity area of research.
8
S URFACE
The first medium, or surface, we tested upon was a slate chalkboard. This medium was chosen as the standard testing surface for consistency and also because of the proven demonstration by Holm [3]. While testing on the slate surface we tested a number of tapping techniques as described in section 7. Overall, this medium when used with a consistent tap produced good results. The second medium we tested was a thin plastic sheet which would be used as an inexpensive pane for framing. With this medium the importance of the tap’s intensity was revealed to have a significant impact on the localization performance. However, we experienced good localization and tap exclusion results. In order to accommodate tapping on this surface we had to lower our impact detection threshold to accommodate taps with less intense peaks. This was also our first test to not include tapping with a specific device. We observed consistent matching by knocking our fingernail against the medium when both of the calibrated location and intensity were matched. The last surfaces we tested our system on were tabletops. The material of the table varied from solid wood to wood composite. We tested tapping this surface with our fingertips and our tapping device. We were able to lower the impact detection threshold low enough to accommodate both tapping techniques, and the localization results observed were acceptable. We observed that the properties of the tap were slightly altered in each medium tested. These differences required minor changes to the exclusion parameters of the processing algorithms. However, the parameters were finally set to a point to accommodate all of the aforementioned surfaces. Additionally, in our GUI application we give the user the ability to modify these parameters to customize the algorithm performance to any medium.
9
C ONCLUSIONS
In this project, we were not able to achieve our original goal of a robust interpolation algorithm. We did, however, make progress toward this goal and built a strong understanding of the problem domain, which would allow us to make real progress on interpolation if we had more time. We were able to implement a polished discrete time-reversal solution designed for operation with a projector display overlaid onto the touch surface. This software projects calibration point locations on the screen and walks the user through the process of calibrating each one. Our investigations into tap exclusion (see section 5.3) take, we believe, a new direction which has not received
10
SE-402 RESEARCH REPORT
significant attention from past researchers. The results of our tap exclusion algorithm are fairly good, and exclusion has been incorporated into our final software deliverables.
10
F URTHER R ESEARCH O PPORTUNITIES
10.1
Alternative Surfaces
The great potential of this technology lies in its application to a wide variety of everyday surfaces. It would be interesting to test the sensing hardware and timereversal software we have developed against surfaces like walls and tile floors. Some of our software parameters would need to be tuned to the new environment, but we surmise that our algorithms would still provide meaningful results in these situations. 10.2
Tap Devices
To reduce the hardware associated with this system, we began testing our solutions where the method of input is the user’s fingertip rather than a dedicated tapping device. This is a challenge that we were unable to devote much time to; however, it is one improvement that we feel would make this technology ready for ubiquitous use. Challenges we observed using fingertips as the tap generator included variations is the intensity of the tap. If the intensity was not consistent a tap would not occur. This problem was extended for situations when multiple users tried to use the application. We found that it was difficult for a user who did not perform the system calibration to find the right intensity and easily use the application. 10.3
Transparent Calibration
The biggest obstacle to the use of time-reversal acoustic localization is the large amount of calibration data required from the user each time the system is set up with a different surface or sensor position. One possible solution is to collect calibration data transparently during normal usage of the system, so that the user does not realize that calibration is taking place. Meaningful calibration data must consist of a known location paired with a received acoustic signal, presumably caused by an impact at that location. To collect such paired data during normal usage of the system, the system must somehow guess where the user actually tapped to produce the signal it has received. If the user is interacting with a GUI composed of discrete sensitive regions (like buttons), one way to guess at this location is to assume that each time the user taps a button, he or she tried to tap the center of that button. Thus, though the system will require enough manual calibration to be able to correctly identify which button the user tapped, after that point, it can gradually refine itself further by using data transparently gathered from user operation. Other techniques for transparent calibration might include the development of calibration games which
deliver some entertainment value while still providing the system with paired locations and acoustic samples. 10.4
Other Interpolation Techniques
None of our interpolation techniques achieved satisfactory localization. This may be due to our calibration grids still being too widely spaced, but it may also be that interpolation techniques based on weighted averaging are not a good solution. It may be better to investigate a regression-based solution which fits piecewise smooth curves to the rows and columns of samples on the correlation surface and then solves those smooth curves for their maxima.
11
R EFERENCES
For a video demonstration of this project’s results, see [11].
R EFERENCES [1]
D. T. Pham, Z. Wang, Z. Ji, M. Yang, M. Al-Kutubi, and S. Catheline, “Acoustic pattern registration for a new type of humancomputer interface,” in IPROMS 2005 Virtual Conference, May 2005. [2] W. Rolshofen, D. T. Pham, M. Yang, Z. Wang, Z. Ji, and M. AlKutubi, “New approaches in Computer-Human Interaction with tangible acoustic interfaces,” in IPROMS 2005 Virtual Conference, May 2005. [3] S. Holm, “Touch sensitive blackboard,” Institutt for Informatikk, Universitetet i Oslo, 2008. [Online]. Available: http://www. youtube.com/watch?v=V4NwoiPGkVY [4] “Technical Solutions for the TDOA Method,” Dipartimento di Elettronica e Informazione, Politecnico di Milano, Tech. Rep., 2006. [5] “Aforge.net,” Andrew Kirillov, et. al. [Online]. Available: http://www.aforgenet.com/framework/ [6] “Fast artificial neural network library,” Steffen Nissen, et. al. [Online]. Available: http://leenissen.dk/fann/wp/ [7] R. K. Ing, N. Quieffin, S. Catheline, and M. Fink, “In sold localization of finger impacts using acoustic time-reversal process,” Applied Physics Letters, vol. 87, 2005. [8] “Reversys technology,” Sensitive Object. [Online]. Available: http://sensitive-object.com/-ReverSys-R[9] D. T. Pham, M. Al-Kutubi, Z. Ji, M. Yang, Z. Wang, and S. Catheline, “Tangible Acoustic Interface Approaches,” in IPROMS 2005 Virtual Conference, May 2005. [10] “Technical solutions and demonstration for acoustic pattern recognition using time reversal method,” Laboratoire Ondes et Acoustique, Tech. Rep., 2006. [11] S. Pollen and N. Radtke, “Acoustic touch screen demonstration,” Milwaukee School of Engineering, 2011. [Online]. Available: http://www.youtube.com/watch?v=ZoAslMiukAQ