#	Pagina
attuale pagina	/open-h2020/projects/195218/results.html
-1	/open-h2020/projects/204860/results.html
-2	/open-h2020/per-topic/retiring/list/index.html
-3	/open-h2020/per-topic/encc/list/index.html
-4	/open-h2020/per-topic/surprise/list/index.html
-5	/open-ted/ita/tender/2022/nuts/itg17/all-all/index.html
-6	/open-ted/ita/tender/2022/nuts/is/all-all/index.html
-7	/open-ted/eng/co/per-nuts/hr065/pag-1/index.html
-8	/open-ted/ita/ca/per-nuts/mk0/pag-1/index.html
-9	/open-ted/ita/tender/2022/nuts/cz080/all-all/index.html
-10	/open-consip/ordini-diretti-mepa/2019/cpv/33157400-9/FLUSSIMETRI+PER+GAS+MEDICALI/index.html

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - StillNoFace (Identity matching from still images without face information)

Teaser

Summary

\"Problem statement
In computer vision, human identity matching from images and/or video has been an active research topic for more than two decades and its popularity is increasing with the increase in computing power.

Integrating soft biometrics such as gender, height, weight, age, and ethnicity to a primary biometrics system (e.g., face) has been studied. In most of the existing methods, the problem of human classification assisted by soft biometrics has been approached using facial information. However, in real-life scenarios, such information might not be available (e.g., the face might be covered or occluded). This led to methods that employ information from the human body to perform human identification and tracking based on soft biometrics.

Moreover, when we are interested in providing a description of an object or a human, we tend to use visual attributes to accomplish this task. For example, a laptop can have a wide screen, a silver color, and a brand logo, whereas a human can be tall, female, wearing a blue t-shirt and carrying a backpack. Visual attributes in computer vision are equivalent to the adjectives in our speech. We rely on visual attributes since they are a meaningful semantic representation of objects or humans that can be understood by both computers and humans. However, effectively predicting the corresponding visual attributes of a human given an image remains a challenging task.

Objectives
In this research project, we propose methods for predicting a personâ€™s identity from images without facial information based on soft biometrics and visual attributes. Having as input a still image or a video showing only the body of an individual in the wild, the overall objectives of the project are:
â€¢ Estimate the gender and soft biometrics of the individual, such as his/her weight and height
â€¢ Retrieve images of individuals with specific visual attributes, such as \"\"wearing a hat\"\", \"\"sitting on a chair\"\"

Benefit for the society
A major application of the outcome of the project is the automated recognition of individuals from images captured by standard cameras in order to allow them to enter to their house or office or to control a car. Moreover, a prominent category of applications involves security and safety in public and private spaces (e.g. airports, train stations, concert halls). In these places, surveillance cameras generally do not provide facial information as the individualâ€™s image may be acquired from the behind. Besides, when it is available, face information may be of low resolution, thus difficult to extract any useful information from it.
\"

Work performed

1) Estimation of soft biometrics from still images
At first, the principle of privileged information was investigated where we proposed a new machine learning method that couples privileged information and conditional random fields [P1, P7]. Then, we proposed a novel method, which performs gender (binary) classification using ratios of anthropometric measurements using the LUPI paradigm [P2, P3]. Using the actual values of anthropometric measurements (e.g., limb lengths in mm) from an anthropometric database results in good gender classification accuracy. We argue though, that such information cannot be accurately obtained from state-of-the-art computer vision algorithms without employing depth information (e.g., use data obtained from a Kinect RGB-D sensor). To address this limitation, we proposed to exploit the use of ratios of anthropometric measurements. Hence, errors in the estimation of the actual values would be alleviated.

2) Human identification by classification of visual attributes
We introduced a method to address the problem of visual attribute classification from images of standing humans [P5, P6]. Instead of using low-level representations, which would require extracting hand crafted features, we proposed a deep learning method to solve multiple binary classification tasks.
The groups of tasks are learned in a curriculum learning scenario, starting with the one with the highest within group cross-correlation and moving to the less correlated ones by transferring knowledge from the former to the latter. The tasks in each group are learned in a typical multi-task classification setup. We have also developed an effective method to obtain the groups of tasks using hierarchical agglomerative clustering, which can be of any number and not just two groups (strongly/weakly correlated).

Publications
[P1] M. Vrigkas, C. Nikou and I. Kakadiaris. Exploiting privileged information for facial expression recognition. IAPR/IEEE International Conference on Biometrics (ICBâ€™16), 13-16 June 2016, Halmstad, Sweden.
[P2] Kakadiaris, N. Sarafianos and C. Nikou. Show me your body: gender classification from still images. IEEE International Conference on Image Processing (ICIPâ€™16), 25-28 September 2016, Phoenix, Arizona, USA.
[P3] N. Sarafianos, C. Nikou, and I. Kakadiaris. Predicting privileged information for height estimation. 23rd International Conference on Pattern Recognition (ICPRâ€™16), 4-8 December 2016, CancÃºn, Mexico.
[P4] M. Vrigkas, E. Kazakos, C. Nikou and I.A. Kakadiaris. Inferring human activities using robust privileged probabilistic learning. 4th Workshop on Transferring and Adapting Source Knowledge in Computer Vision (TASK-CV), in conjunction with the International Conference on Computer Vision (ICCV\'17), Venice, Italy, October 22-29 2017.
[P5] N. Sarafianos, Th. Giannakopoulos, C. Nikou and I. Kakadiaris. Curriculum learning for multi-task classification of visual attributes. 4th Workshop on Transferring and Adapting Source Knowledge in Computer Vision (TASK-CV), in conjunction with the International Conference on Computer Vision (ICCV\'17), Venice, Italy, October 22-29 2017.
[P6] N. Sarafianos, Th. Giannakopoulos, C. Nikou and I. Kakadiaris. Curriculum learning of visual attributes clusters for multi-task classification. https://arxiv.org/abs/1709.06664
[P7] M. Vrigkas, E. Kazakos, C. Nikou and I. Kakadiaris. Human activity recognition using robust adaptive privileged probabilistic learning. https://arxiv.org/abs/1709.06447

Final results

Our work in gender estimation from anthropometric measurements employs a database to predict soft biometric attributes. However, there are two differences with respect to the state of the art. First, since the actual anthropometric measurements are highly unlikely to be obtained accurately from computer vision algorithms that use images or videos captured from surveillance cameras, we opted for using ratios of anthropometric measurements. Second, we argue that several anthropometric measurements are relatively difficult to be estimated automatically (e.g., circumferences of human parts) and that such information will not be available in automatically acquired data.

In visual attribute classification from images of humans, our proposed method, finds the sequence in which clusters of visual attributes are learned very efficiently and classifies them with high performance. Given images of standing humans as an input, we performed end-to-end learning by solving multiple binary classification problems simultaneously. Tasks were grouped into clusters by employing hierarchical agglomerative clustering based on their correlation. The sequence (i.e., curriculum) in which clusters were learned was found by computing the average cross-correlation within each cluster and sorting the obtained values in a descending order.