sobre la visión por computador aplicada a los humanos ... · sobre la visión por computador...

Post on 17-Oct-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Sobre la visión por computador

aplicada a los humanos:

problemas abiertos y aplicaciones.

Jordi VitriàJordi VitriàBCN Perceptual Computing Lab

Departament de Matemàtica Aplicada i Anàlisi, Facultat de Matemàtiques, Universitat de Barcelona,

Gran Via de les Corts Catalanes, 585, 08007 Barcelona

&

Centre de Visió per Computador

Edifici O, Campus de la UAB, Bellaterra, 08193 Barcelona

jordi.vitria@ub.edu

bcnpcl.wordpress.com

Human-robot interaction is not possible without rich, robust models for the

perception (in the broadest sense) of humans.

13/09/2010 Jordi Vitrià | Septiembre 2010 3

13/09/2010 Jordi Vitrià | Septiembre 2010 4

Humans are not a common object, such as cars,

trees or buildings:

Humans display rich behaviors with rich

information that is useful for predicting actions

and decisions.

13/09/2010 Jordi Vitrià | Septiembre 2010 5

and decisions.

Humans communicate by perceiving and

producing visual signals.

13/09/2010 Jordi Vitrià | Septiembre 2010 6

From David Marr's book: Vision, 1982.

Definition:

As a scientific discipline, computer vision is concernedwith the theory and technology for building artificialsystems that obtain information from images. Theimage data can take many forms, such as a videosequence, views from multiple cameras, or multi-

13/09/2010 Jordi Vitrià | Septiembre 2010 7

sequence, views from multiple cameras, or multi-dimensional data from a medical scanner.

obtain information from images =

physical word description

Object detection, recognition and tracking...

13/09/2010 Jordi Vitrià | Septiembre 2010 8

But, what about understanding people?

THE CANONICAL VIEW

1. There is a great need for computer programs that can

describe and predict people activities from video,

2. This is difficult to do, because it is hard to detect,

identify and track people in video sequences, because

we have no common vocabulary for describing what

13/09/2010 Jordi Vitrià | Septiembre 2010 9

we have no common vocabulary for describing what

people are doing, and because the interpretation of

what people are doing depends very strongly on

context.

That’s true, but this is not the whole truth: there is

also a lack of appropriate models for understanding

people and their social world.

13/09/2010 Jordi Vitrià | Septiembre 2010 10

Human sensing =

«bounding box» problem + pose problem + attributes problem +

interaction problem + gestures + social signals +…

Face detection Full body detection

The «bounding box» problem.

Upper body detection

13/09/2010 Jordi Vitrià | Septiembre 2010 11

The «bounding box» problem: face detection

13/09/2010 Jordi Vitrià | Septiembre 2010 12

Basic idea: slide a (multiscale) window across image and

evaluate a face model at every location.

The «bounding box» problem: face detection

Templates: 20, 30, 40, 50, 60 px

Image: 640x480 px

Translation: 5 px

Speed: 10fps

------------------------------------------

Total: 62135 searches -> 1,6μs/search

13/09/2010 Jordi Vitrià | Septiembre 2010 13

The «bounding box» problem: face detection

Fast Feature Computation: Integral Image

13/09/2010 Jordi Vitrià | Septiembre 2010 14

Smallest

Scale

Larger

Scale

The «bounding box» problem: face detection

Face detection solution: efficient features +

machine learning on very large datasets of

examples.

13/09/2010 Jordi Vitrià | Septiembre 2010 15

State of the art: 89%

The «bounding box» problem: face detection

13/09/2010 Jordi Vitrià | Septiembre 2010 16

“Large-scale Privacy Protection in Google Street View”, Andrea Frome, German Cheung, Ahmad Abdulkader, Marco Zennaro, Bo Wu, Alessandro Bissacco, Hartwig Adam, Hartmut Neven, Luc

Vincent, IEEE International Conference on Computer Vision, 2009.

Person Person

The «bounding box» problem: body detection

13/09/2010 Jordi Vitrià | Septiembre 2010 17

The «bounding box» problem: full body detection

13/09/2010 Jordi Vitrià | Septiembre 2010 18

Pedestrian detection using histograms of oriented gradients (Dalal and Triggs 2005)

Upper Body

The «bounding box» problem: upper body detection

13/09/2010 Jordi Vitrià | Septiembre 2010 19

Upper-body detector by Manuel J. Marín-Jiménez, Vittorio Ferrari and Andrew Zisserman

The «bounding box» problem: person detection

13/09/2010 Jordi Vitrià | Septiembre 2010 20

Part-based object detection (Felzenszwalb et al. 2008)

The «bounding box» problem: person detection

13/09/2010 Jordi Vitrià | Septiembre 2010 21

Part-based object detection (Felzenszwalb et al. 2008)

The «bounding box» problem: person detection

13/09/2010 Jordi Vitrià | Septiembre 2010 22

Lubomir Bourdev, Jitendra Malik, Poselets: Body Part Detectors Trained

Using 3D Human Pose Annotations, ICCV 2009

The «bounding box» problem: person detection

• Detect poselets

(SVM)

• Hough-vote for each

torso location

• Score each cluster:

13/09/2010 Jordi Vitrià | Septiembre 2010 23

)(xaiScore of poselet iat location x

iwWeight of poselet ilearned via M2HT[Maji/Malik CVPR09]

The «bounding box» problem: person detection

13/09/2010 Jordi Vitrià | Septiembre 2010 24

Head

Head

The «bounding box» problem: human layout

13/09/2010 Jordi Vitrià | Septiembre 2010 25

The PASCAL Visual Object Classes Challenge 2010

The «bounding box» problem: human layout

The head is detected by integrating several state-of-the-art part detectors:

13/09/2010 Jordi Vitrià | Septiembre 2010 26

Face (frontal +

lateral) detection

Person detection

using poseletsPerson detection

using Pictorial

Model

Person

Detection

using

Discriminatively

Trained Part-

Based Models

The «bounding box» problem: human layout

EXAMPLE: PASCAL Human Layout Challenge 2010

Faces were detected with OpenCV 2.1.

Details of the implementation:

• We use the following cascades:

• Frontal face (default, alt, alt2, alt_tree).

• Lateral face (profile).

• Each cascade return several (from 0 up to N) hypothesis

about head position.

• To integrate the results we use hierarchical clustering.

13/09/2010 Jordi Vitrià | Septiembre 2010 27

Face (frontal +

lateral) detection

• To integrate the results we use hierarchical clustering.

• The final head box is the one with the maximum score

given by hierarchical clustering.

References: Viola, Jones: Robust Real-time Object Detection, IJCV 2001

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.8

0.9

1

recall

prec

isio

n

subset: val, part: head, AP = 0.530

The «bounding box» problem: human layout

We use a person detection system proposed by

Felzenszwalb et al. to detect the body.

Details of the implementation:

• Software version: Discriminatively Trained

Deformable Part Models Version 4.

• Based on model aspect analysis we choose 4 models

which best detect the head position.

• For each model we choose the component related with

head position in order to fix the box.

13/09/2010 Jordi Vitrià | Septiembre 2010 28

head position in order to fix the box.

References:

• P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with

Discriminatively Trained Part Based Models, PAMI 2009

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.70.80.9

recall

prec

isio

nsubset: val, part: head, AP = 0.459

Person

detection

The «bounding box» problem: human layout

We use the body detection system proposed by Bourdev

et al.

• Initially, we used the set of 1138 poselets trained from the H3D

database.

• The poselets were trained to vote for position and size of the

head.

• In order to improve results a hierarchical clustering per poselet

was introduced.

• From original poselets set, we selected the 239 poselets which

gives the best, in terms of reliability, votes for the head position.

The used selection criteria was the standard deviation (std) of

13/09/2010 Jordi Vitrià | Septiembre 2010 29

Poselets

detection

The used selection criteria was the standard deviation (std) of

votes for head.

• If std was smaller than a defined threshold then the poselet was

defined as reliable.

Reference:

• Lubomir Bourdev, Jitendra Malik, Poselets: Body Part Detectors Trained Using 3D Human Pose

Annotations, ICCV 2009.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

0.70.80.9

recall

prec

isio

n

subset: val, part: head, AP = 0.425

The «bounding box» problem: human layout

Confidence 0.5 Confidence 0.8 Confidence 1.6 Confidence 2.25

13/09/2010 Jordi Vitrià | Septiembre 2010 30

Confidence 0.5 Confidence 0.8 Confidence 1.6 Confidence 2.25

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.70.80.9

recall

prec

isio

n

subset: val, part: head, AP = 0.753

subset: val, part: hand, AP = 0.000

The «bounding box» problem: human layout

13/09/2010 Jordi Vitrià | Septiembre 2010 31

The «bounding box» problem: human layout

13/09/2010 Jordi Vitrià | Septiembre 2010 32

Hand

Hand

Foot

Foot

The «bounding box» problem: human layout

13/09/2010 Jordi Vitrià | Septiembre 2010 33

Hands

Foot

The «bounding box» problem: human layout

Hand detection is a SEARCH problem.

13/09/2010 Jordi Vitrià | Septiembre 2010 34

Hands

Karlinsky Leonid, Dinerstein Michael, Daniel Harari, and Ullman Shimon.

The chains model for detecting parts by their context, CVPR 2010.

The «bounding box» problem: human layout

Hand detection is a SEARCH problem.

13/09/2010 Jordi Vitrià | Septiembre 2010 35

Karlinsky Leonid, Dinerstein Michael, Daniel Harari, and Ullman Shimon.

The chains model for detecting parts by their context, CVPR 2010.

The «bounding box» problem: human layout

Hand detection is a SEARCH problem.

fL)3(TF

)1(TF)2(TF

hL

2F6F

7F

5F

13/09/2010 Jordi Vitrià | Septiembre 2010 36

Karlinsky Leonid, Dinerstein Michael, Daniel Harari, and Ullman Shimon.

The chains model for detecting parts by their context, CVPR 2010.

M,T

Chains model

)2(TF4F

1F

3F

6F

The «bounding box» problem: human layout

Hand detection is a SEARCH problem.

13/09/2010 Jordi Vitrià | Septiembre 2010 37

Karlinsky Leonid, Dinerstein Michael, Daniel Harari, and Ullman Shimon.

The chains model for detecting parts by their context, CVPR 2010.

Gender Ethnicity Age

Facial Attributes

Hair

Glasses

Facial Traits

Aggressiveness

The attributes problem.

13/09/2010 Jordi Vitrià | Septiembre 2010 38

Identity

Facial Expressions

Affect

Emblems

Head pose

Automatic Point-based Facial

Trait Judgments Evaluation

The attributes problem.

Automatic Point-based Facial

Trait Judgments Evaluation

• People are extremely efficient

at making trait judgments (e.g.,

competent, trustworthy) from

faces.

The attributes problem.

faces.

• Rapid, unreflective judgments

of competence based solely on

facial appearance predict

election outcomes.Physiognomy

Automatic Point-based Facial

Trait Judgments Evaluation

Darwin was almost denied the

chance to take the historic

Beagle voyage on account

of his nose.

The attributes problem.

Apparently, the Captain [a fan of

Lavater] did not believe that a

person with such a nose would

“possess sufficient energy and

determination.”

Automatic Point-based Facial

Trait Judgments Evaluation

Evaluating faces = Judging the book by its cover.

• 100 ms exposure is sufficient for a variety of person

judgments

The attributes problem.

– Competence

– Trustworthiness

– Aggressiveness

– Likeability

• Additional time exposure increases confidence in Judgments

• Single glance impressions

Automatic Point-based Facial

Trait Judgments Evaluation

Predicting Senate Elections

The attributes problem.

Automatic Point-based Facial

Trait Judgments Evaluation

The attributes problem.

Automatic Point-based Facial

Trait Judgments Evaluation

The attributes problem.

13/09/2010 Jordi Vitrià | Septiembre 2010 46

From: A.Vinciarelli, M.Pantic, H.Boulard, Social signal processing: Survey of an emerging domain, Image and Vision Computing, Volume 27, Issue 12, November

2009, Pages 1743-1759

Body Pose/Postures

The pose problem.

13/09/2010 Jordi Vitrià | Septiembre 2010 47

The pose problem.

13/09/2010 Jordi Vitrià | Septiembre 2010 48

The pose problem.

13/09/2010 Jordi Vitrià | Septiembre 2010 49

http://www.vision.ee.ethz.ch/~hpedemo/

Human2Human Human2Object

Proxemics Manipulation

The interaction problem.

The interaction problem.

B. Yao and L. Fei-Fei. Modeling Mutual Context of Object and Human Pose in

Human-Object Interaction Activities. IEEE Computer Vision and Pattern Recognition

(CVPR). 2010.

The interaction problem.

B. Yao and L. Fei-Fei. Modeling Mutual Context of Object and Human Pose in

Human-Object Interaction Activities. IEEE Computer Vision and Pattern Recognition

(CVPR). 2010.

The interaction problem.

B. Yao and L. Fei-Fei. Grouplet: a Structured Image Representation for Recognizing

Human and Object Interactions. IEEE Computer Vision and Pattern Recognition

(CVPR). 2010.

The interaction problem.

Context

We can use context!(from Andrew C. Gallagher, A Framework for Using Context to Understand Images of People, PhD Thesis, Carnegie Mellon University, 2009)

Pixel Level

Clothing, other people, relative pose, posture, ...

Capture Content

Time, location, calibration, flash, ...

13/09/2010 Jordi Vitrià | Septiembre 2010 55

Social Context

First name, age, gender, social relationship,

anthropometric data, personal calendar, ...

Context

Contextual features that capture the structure of a group

of people, and the position of individuals within the

group.

13/09/2010 Jordi Vitrià | Septiembre 2010 56

Minimum Spanning Tree Nearest Neighbors

And all this knowledge can be used in real applications…

Agitation in ICU

Conclusion

To build “people perception models” is an Internet vision

problem (= visual feature extraction + machine learning + large

databases) that is still in its infancy.

top related