In this paper, we present a practical methodfor reconstructing the bidirectional reflectance distribu-tion function (BRDF) from multiple images of a real object composed of a homogeneous material. The key idea is that the BRDF can be sampled after geometry estimation using multi-view stereo (MVS) techniques. Our contribution is selection of reliable samples of lighting, surface normal, and viewing directions for robustness against estimation errors of MVS. Our method is quantitatively evaluated using synthesized images and its effectiveness is shown via real-world experiments.
Segmentation is the act of partitioning an image into different regions by creating boundaries between regions. k-means image segmentation is the simplest prevalent approach. However, the segmentation quality is contingent on the initial parameters (the cluster centers and their number). In this paper, a convolution-based modified adaptive k-means (MAKM) approach is proposed and evaluated using images collected from different sources (MATLAB, Berkeley image database, VOC2012, BGH, MIAS, and MRI).The evaluation shows that the proposed algorithm is superior to k-means++, fuzzy c-means, histogram-based k-means, and subtractive k-means algorithms in terms of image segmentation quality (Q-value), computational cost, and RMSE. The proposed algorithm was also compared to state-of-the-art learning-based methods in terms of IoU and MIoU; it achieved a higher MIoU value.
This paper presents a novel mixed reality based navigation system for accurate respiratory liver tumor punctures in radiofrequency ablation (RFA). Oursystem contains an optical see-through head-mounted display device (OST-HMD), Microsoft HoloLens for perfectly overlaying the virtual information on the patient, and a optical tracking system NDI Polaris for calibrating the surgical utilities in the surgical scene. Compared with traditional navigation method with CT, our system aligns the virtual guidance information and real patient and real-timely updates the view of virtual guidance via a position tracking system. In addition, to alleviate the difficulty during needle placement induced by respiratory motion, we reconstruct the patient-specific respiratory liver motion through statistical motion model to assist doctors precisely puncture liver tumors. The proposed system has been experimentally validated on vivo pigs with an accurate real-time registration approximately 5-mm mean FRE and TRE, which has the potential to be applied in clinical RFA guidance.
In this paper, we propose a simple but effective framework for lane boundary detection, called SpinNet. Considering that cars or pedestrians often occlude lane boundaries and that the local features of lane boundaries are not distinctive, therefore, analyzing and collecting global context information is crucial for lane boundary detection. To this end, we design a novel spinning convolution layer and a brand-new lane parameterization branch in our network to detect lane boundaries from a global perspective. To extract features in narrow strip-shaped fields, we adopt strip-shaped convolutions with kernels which have 1×n or n×1 shape in the spinning convolution layer. To tackle the problem of that straight strip-shaped convolutions are only able to extract features in vertical or horizontal directions, we introduce the concept of feature map rotation to allow the convolutions to be applied in multiple directions so that more information can be collected concerning a whole lane boundary. Moreover, unlike most existing lane boundary detectors, which extract lane boundaries from segmentation masks, our lane boundary parameterization branch predicts a curve expression for the lane boundary for each pixel in the output feature map. And the network utilizes this information to predict the weights of the curve, to better form the final lane boundaries. Our framework is easy to implement and end-to-end trainable. Experiments show that our proposed SpinNet outperforms state-of-the-art methods.
In recent years, deep learning has achieved great success in the field of image processing. In the single image super-resolution (SISR) task, the con-volutional neural network (CNN) extracts the features of the image through deeper layers, and has achieved impressive results. In this paper, we propose a singleimage super-resolution model based on Adaptive Deep Residual named as ADR-SR, which uses the Input Output Same Size (IOSS) structure, and releases the dependence of upsampling layers compared with the existing SR methods. Specifically, the key element of our model is the Adaptive Residual Block (ARB), which replaces the commonly used constant factor with an adaptive residual factor. The experiments prove the effectiveness of our ADR-SR model, which can not only reconstruct images with better visual effects, but also get better objective performances.
Role-event videos are rich in information but challenging to be understood at the story level. The social roles and behavior patterns of characters largely depend on the interactions among characters and the background events. Understanding them requires analysisof the video contents for a long duration, which is beyond the ability of current algorithms designed for analyzing short-time dynamics. In this paper, we propose InSocialNet, an interactive video analytics tool for analyzing the contents of role-event videos. It automatically and dynamically constructs social networks from role-event videos making use of face and expression recognition, and provides a visual interface for interactive analysis of video contents. Together with social network analysis at the back end, InSocialNet supports users to investigate characters, their relationships, social roles, factions, and events in the input video. We conduct case studies to demonstrate the effectiveness of InSocialNet in assisting the harvest of rich information from role-event videos. We believe the current prototype implementation can be extended to applications beyond movie analysis, e.g., social psychology experiments to help understand crowd social behaviors.
Reconstruction of man-made scenes from multi-view images is an important problem in computer vision and computer graphics. Observing that man-made scenes are usually composed of planar surfaces, we encode plane shape prior in reconstructing man-made scenes. Recent approaches for single-view reconstruction employ multi-branch neural networks to simultaneouslysegment planes and recover 3D plane parameters. However, the scale of available annotated data heavily limits the generalizability and accuracy of these supervised methods. In this paper, we propose multi-view regularization to enhance the capability of piecewise planar reconstruction during the training phase, without demanding extra annotated data. Our multi-view regularization enables the consistency among multiple views by making the feature embedding more robust against view change and lighting variations. Thus, the neural network trained by multi-view regularization performs better on a wide range of views and lightings in the test phase. Based on more consistent prediction results, we merge the recovered models from multiple views to reconstruct scenes. Our approach achieves state-of-the-art reconstruction performance compared to previous approaches on the public ScanNet dataset.