presentación tesis 08022016

Visual attention and perception models for assessing quality in 2D

and 3D stereoscopic video

Juan Pedro López Velasco - [email protected]: José Manuel Menéndez García - [email protected]

Universidad Politécnica de MadridMadrid, 8th February 2016

2

Index• Introduction• Objectives and Work Development• Visual discomfort prediction in 3D

stereoscopic video• Visual Attention Model for Video Quality

Assessment• Conclusions and Future work• Merits

3

Introduction• Quality of Experience (QoE) is defined as the degree

of delight or annoyance of the user of an application or service, in this case, multimedia services.

• Necessary: Estimation of QoE in different stages of video broadcasting dataflow and for a variety of sources: 2D and 3D.

4

Scenarios

CONTENT CREATION PHASE:

Visual comfort assessment

(3D)

COMPRESSION PHASE:

Visual attention and

saliency models (2D)

5

…the final user.

The most important thing in video quality assessment is…

6

Ob

ject

ives

an

d W

ork

D

eevl

opm

ent

7

Objectives (I)

For visual comfort assessment (3D):• Detecting empirically the main sources of visual discomfort in 3D

stereoscopic video after developing subjective assessment.• Quantifying the situations of sequences where the probability of visual

discomfort to occur is higher.• Analyzing the factors of motion, distribution of parallax and disparity

change in pairs of sequences for developing tools that correspond to human perception.

• Demonstrate with sequences that the results obtained in subjective assessment may be predicted with objective parameters and characteristics measurement.

8

Preliminary subjective assessmnet

Determination of visual discomfort sources

Characterization of video sequences

Statistics analysis, new subjective assessment, metrics development and drawing conclusions

Work Development (I)

9

Objectives (II)

For visual attention and saliency models (2D):• Improving objective quality metrics by applying visual attention models,

which weight regions of interest to obtain results closer to human eye’s response.

• Determining accurate visual attention models, particular for each sequence, which predict the most probable areas observed by the user.

• Weighting the saliency factors analyzed by the use of subjective assessment. These saliency factors are the following: motion, level of detail, face detection and position of pixel.

• Demonstrating the improvement of the objective metrics for measuring quality and artifacts in the sequence when applying the developed visual attention model (Advanced Blur metric)

10

Determining factors: motion, face detection, level of detail and position

Subjective assessment with artificially impaired sequences

Weighting these factors in order of importance.

Visual attention model generation

Application of model in objective metrics (Advanced Blur metric)

Work Development (II)

11

Vis

ual

dis

com

fort

pre

dic

tion

in

3D

ste

reos

cop

ic v

ideo

12

Introduction to Stereoscopy• Stereoscopic 3D video perception is based on the fact that two

different video signals (different but highly correlated) are captured in order to feed each of the viewer’s eyes.

• One signal is received by the left eye and another one by the right eye. The brain fuses left and right view.

• 3D video imitates the binocular human vision (natural view).• The cyclopean eye is an imaginary eye situated midway

between the two eyes.

13

Disparity and Parallax

• Disparities are the differences between the angles subtended between pairs of features.

• Parallax is created by disparities: Positive, negative or zero, depending on the position of the object respect to the screen.

14

Example of 3D disparity

15

Accommodation-Vergence conflict

• Viewing an object in stereoscopic displays: – Eyes accommodate to the screen – But when rotating to fix the apparent object (vergence)– an inconsistency between them occurs (derived from stereopsis).

• This effect is the accommodation-vergence conflict.

16

Problem description

• Disparity may offer an incredible experience, BUT differences in 3D disparity eye may have difficulties to focus objects causing visual discomfort, annoyance, headache.

• The eye focus the objects: Accommodation of the eyes needs enough time to adapt to changes for correct vision of 3D videos (importance of motion).

• Common sources of visual discomfort:– Excessive binocular parallax (especially negative)– Accommodation and vergence mismatches (AVM)

17

Accomodation-Vergence Mismatches (AVM)

• AVM is one of the most frequent sources of visual discomfort in 3DTV.

• When position of the objects change (parallax), the accommodation is constant but the vergence changes.

• The crystalline must adapt to change fastly.

Near distance object Far distance object

18

Zone of Comfort

• Zone of Comfort (ZoC) is a term introduced by Percival (1892) to define the relationship between distance of vergence and distance to the screen (accommodation distance).

• Studies focused on static images (Shibata, 2011)

19

Work methodologyCharacterization of individual video sequences

Sequence

Motion

Depth map

Distribution of parallax

1

Sequence 1

Sequence 2

Combination of video pair sequences2

Wide casuistic of transitions

Subjective assessment with pairs of sequences for transition analysis

3 Analysis of when visual discomfort happens4

20

Characterization of video sequences

• Tools for characterization:– Depth maps: using SAD (Sum of Absolute Differences) techniques.– Histograms of parallax information (based on depth map information)– Diagrams of TI (Temporal Information) and SI (Spatial Information) variation.

SAD

21

Case of study: Sequence Palco HD• Separation of virtual cameras over the average interpupillary

distance. Human eye adapts to change produce by negative parallax, but… abrupt transition generates discomfort.

Progressive Temporal Parallax

variation

22

Subjective Assessment

• Analysis of changes / transitions between pairs of video sequences to determine a preliminary ZoC.

• Analysis of transitions between scenes:– Selection of sequences with different values of SI (Spatial Information) and TI

(Temporal information), bidimensional information.– Selection of sequences with diferent values of spatial and temporal parallax

variance (negative, parallax), tridimensional information

• Test conditions (following Recommendations BT.500 and P.910)– 74 observers– 65 inches television– Observation distance: 2,5 m– HD sequences– Annoyance 5-notes Scale

MOS Scale

Annoyance derived from transition Quality of Experience

5 Very comfortable Excellent Experience4 Comfortable Good Experience3 Mildly uncomfortable No visual discomfort2 Uncomfortable Visual discomfort1 Extremely uncomfortable High visual discomfort

23

Results of subjective assessment

24

Transition: Angel to Ladder (I)

40% of the people gave a score that manifests visual discomfort

25

Transition: Angel to Ladder (II)

Parallax variation in pixel

26

Transition: “Spaceship” to “Astronaut”

Negative parallax in right side of first video to negative/positive combination

27

Transition: “Station” to “Itaca3d”

This is the worst scored transition in the tests

↑↑Motion↑↑Motion

Hiperstereoscopy!

28

Transition: “Boxers” to “Dance”

Negative parallax located in different areas, less annoyance for observers.

29

Transition: “Hall” to “Laboratory”

Both videos with negative parallax in both videos and window violation → low scores.

Window violation!

30

Conclusions• After subjective assessment, results indicate the necessity of

evaluating both static disparity and dynamic variation of the stereoscopic image, in terms of motion.

• ZoC is affected by motion in the scene. The state-of-the-art must be actualized to offer results with tests of dynamic sequences.

• Avoiding visual discomfort is possible locating objects in positive parallax, BUT that implies a consequent decrease of QoE.

• Negative parallax must be controlled to generate soft variations:– Fast variation of negative parallax is usually the main source of visual discomfort,

especially when the transition is produced to a content with a completely different disparity diagram.

– Only hyperstereoscopy (i.e. pixels with negative parallax with disparities higher than 5) in the sequence is not enough for detecting visual discomfort, it is the transition what provokes the discomfort.

• Positive parallax is recommended for its tolerance to visual discomfort and the consequent.

31

Future work

After the conclusions obtained after detecting the main sources of visual discomfort:

• Developing recommendations and guidelines for 3D contents creators.

• Generating tools for automatic detection of discomfort in 3D videos.

32

Visu

al A

ttent

ion

Mod

el fo

r Vid

eo

Qua

lity

Ass

essm

ent

33

Contents

• Introduction: Problem description• Calibration of the visual attention model

– Artificially impaired video sequences generation: Analysis of video characteristics by regions Creation of masks based on ROI’s

• Results and examples with test sequences• Advanced blur metric

– Application to real video sequences (encoded in H.264 at different bitrates)• Conclusions

34

Problem description (I)

• Assessing video quality is still a complex task.• Video Quality Assessment needs to correspond to human

perception.• Visual attention is focused on concrete regions (ROI’s) of an image

as demonstrated with fixation maps and eye-tracking.

Original image Fixation map Image with visual attention weights

35

• Most pixel-based metrics do not present enough correlation between objective and subjective results

• Algorithms need to correspond to human perception when analyzing quality in a video sequence.

• For example, these four frames have the same MSE.

• Video quality metrics should correlate with visual attention and psychovisual models adapted to concrete artifacts and their visualization.

Problem description (II)

High blocking High blurring (defocus) Salt and pepper noise JPEG encoding

36

Visual Attention Features

• According to context-aware saliency detection model proposed by Goferman et al [GOFERMAN-1, 2012], image regions of interest are detected based on four principles of human attention supported by psychological evidence– Low-level characteristics affecting to each individual pixel, such as color

and contrast– Global considerations, which suppress frequently occurring features,

while maintaining features that deviate from the norm.– Visual organization rules which state that visual forms may possess one

or several centers of gravity about which the form is organized– High-level factors, such as human faces or concrete objects recognition.

This factor could be content dependent, but human faces generate specific patterns in human retina that increase the probability of be perceived related to psychological and cognitive features.

37

Example of artificially impaired sequences

• Impaired area (with blocking artifact) located in human faces ROI.

• This effect is excessive in this example but in real life is a common effect.

38

Work methodology• Objectives:

– Calibration of the influence of features (ROI) for determining the visual attention model.

– Creation of Advanced Blur Metrics• Methodology for Visual Attention Model:

– Selection of ROI’s: motion, faces, spatial detail and position.– Creation of masks for artificially impaired sequences (adapted to

concrete artifact: blurring).– Subjective Assessment: Opinions of users (MOS scaled).– Search for inconsistencies between subjective assessment (MOS

obtained) with pixel-based objective metrics (PSNR), to weight the influence of each feature.

• Advanced Blur metric: loss of energy (blur) adapted to visual attention.• Tests: Once the visual attention model is generated, it will be tested with

real sequences (distorted by the effect of H.264 encoding).

39

Scheme of artificially impaired video sequences generation

Impaired video

sequenceOriginal

video sequence

Artificiallyimpaired sequence

InverseFeature

Mask

FeatureMask

Distortion

(2 sequences for each distortion:

One and the opposite case

As seen in next example)

40

Impairment and artifacts insertion process

Original video

sequenceArtifact Distortion

Impaired video

sequence

Blocking

Blurring

Ringing

Blocking simulated with 8x8 mosaic filter

Blurring simulated with gaussian lowpass filter

Ringing simulated with JPEG codification filter

41

Creation of masks based on ROI’s (I)

• Types of regions of interest for masks

Original video

sequence Feature Detection

Feature Mask

Inverse Feature Mask

Motion

Spatial Detail

Faces

Position

Color

42

Motion mask

• For motion detection, temporal information in consecutive frames is scrutinized

• Temporal information is analyzed:

0),(),(,.),( 1 yxFyxFifMaskyxPix frameiii

Original frame Motion mask based on TI

43

Spatial Detail Mask• Textures, edges and objects in motion are the source of hiding or

highlighting a determined impairments, in cases such as blocking or blurring artifacts.

• Canny algorithm is used to create binary masks for separating homogenous from high-frequencies areas.

Original frame Spatial detail mask based on Canny algorithm

44

Pixel Position Masks• The image is divided in 9 sections (Nojiri, 2009)• Objective: Analyzing influence of pixel position by areas.

• Three types of masks are created depending on the regions:

Nojiri’s sections distribution

Corner mask Lateral mask Central mask

45

Facial Mask

• Haar algorithm included in OpenCV libraries based on a boosted cascade of simple features is used for face detection

Face detection Face mask

46

Subjective assessment for calibration• Results based on subjective tests are analyzed to demonstrate

the validity of test sequences. Spatial detail is analyzed in these 3 sequences.

• MOS scale is used: 5 (excellent) to 1 (Poor)

“News Report”: Faces “Barrier”: Motion “Crowd”: Pixel Position

Sequence FR Metric

H.264 Impairment located in Faces ROI.

75Mbps 500Kbps D. Inv.

News Report

PSNR 47.93 37.58 46.82 34.52

Blur 0.44 3.63 0.38 5.17

MSE 0.67 1.93 0.10 2.30

MOS 4.81 1.54 1.33 3.78

Sequence FR Metric

H.264 Impairment located in Motion ROI.


Barrier

PSNR 49.82 33.19 39.85 34.24

Blur 0.27 8.36 1.97 6.24

MSE 0.51 3.34 0.359 2.98

MOS 4.77 1.33 3.11 3.89

Seq. FR Metric

H.264 Impairment located in Position ROI’s

75 Mbps

500 Kbps

Center Lateral Corner

D. Inv. D. Inv. D. Inv.

Crowd

PSNR 34.33 25.34 30.74 26.82 33.87 26.00 35.95 25.88

Blur 3.44 22.55 6.27 15.33 2.60 19.44 0.95 22.47

MSE 3.55 8.76 2.30 6.21 1.21 7.30 0.64 7.87

MOS 4.68 1.22 1.44 2.44 3.78 1.33 4.11 1.22

47

Calibration of Faces

• Distortion is located in the human faces ROI• Subjective MOS values are lower (1.33) than when located in

the rest of the picture and faces appear sharp (3.78)• Inconsistence with objective metrics: PSNR (46.82 vs. 34.52) or

MSE’s behavior (0.10 vs. 2.30)

Sequence FR Metric

H.264 Impairment located in Faces ROI.


News Report

PSNR 47.93 37.58 46.82 34.52

Blur 0.44 3.63 0.38 5.17

MSE 0.67 1.93 0.10 2.30

MOS 4.81 1.54 1.33 3.78

48

Calibration of Motion and Faces• A similar situation occurs when analyzing motion in “Barrier”

sequence. Inconsistence with objective metrics.

• Inconsistencies in corner regions between MOS and objective metrics, such as PSNR, for sequence “Crowd”.

• Inconsistencies in spatial detail areas, less

Sequence FR Metric

H.264 Impairment located in Motion ROI.


Barrier

PSNR 49.82 33.19 39.85 34.24

Blur 0.27 8.36 1.97 6.24

MSE 0.51 3.34 0.359 2.98

MOS 4.77 1.33 3.11 3.89

Seq. FR Metric

H.264 Impairment located in Position ROI’s

75 Mbps

500 Kbps

Center Lateral Corner

D. Inv. D. Inv. D. Inv.

Crowd

PSNR 34.33 25.34 30.74 26.82 33.87 26.00 35.95 25.88

Blur 3.44 22.55 6.27 15.33 2.60 19.44 0.95 22.47

MSE 3.55 8.76 2.30 6.21 1.21 7.30 0.64 7.87

MOS 4.68 1.22 1.44 2.44 3.78 1.33 4.11 1.22

49

Relative influence of factors

• After subjective assessment we concluded that the following chain of influence has been considered

Faces > Central > Motion > Detail > Lateral > Corner

50

Example of psychovisual model defined (I)

Frame from sequence “News Report”

51

Example of psychovisual model defined (II)

Motion Mask Spatial Details Mask

Pixel Position Mask Faces Mask

52

Advanced Blur metric

• Blur metrics calculates the loss of energy when compressing a video sequence with transforms, such as DCT. Blur is the comparison of gradient between reference and distorted image

• Advanced Blur includes the effect of visual attention model.

1

0

1

0

)),(()),((),(W

j

H

icodref jifGEjifGEjipsyBlur

1

0

1

0

)),(()),((1 W

j

H

icodref jifGEjifGE

HWBlur

Advanced Blur:

3

0

)(

),(),(),(),(),(

cMAX

FACESPOSDETMOT

ccoefHW

jicoefjicoefjicoefjicoefjipsy

53

Test with real sequences

• Real sequences encoded at different bitrates:– H.264: 6Mbps – 500Kbps (HD Sequences)

Umbrella Boxers

Tree BranchesPhone Call

54

Results (I)

• Results of sequences compared to MOS (subjective opinión), PCC (Pearson Correlation Coefficient), and the improvement from conventional Blur metric to Advanced Blur metric.

Sequence Value 6Mbps 4Mbps 1Mbps 500Kbps PCC Δ(Adv.Blur-Blur)

Boxers Blur 0,650 0,920 3,040 6,880 -0,953 2,97% Adv Blur 1,340 1,480 2,000 2,660 -0,983 MOS 4,778 4,111 2,444 1,333

Hall Blur 0,790 3,280 14,180 27,230 -0,982 1,40% Adv Blur 2,440 3,490 6,880 9,670 -0,996 MOS 4,889 4,111 2,667 1,556

Phone Call Blur 1,950 2,260 3,460 4,490 -0,990 0,94% Adv Blur 1,640 1,780 1,990 2,170 -0,999 MOS 4,889 4,000 2,444 1,333

Tree Branches Blur 11,920 17,360 22,380 20,120 -0,863 13,24% Adv Blur 6,150 8,030 9,790 12,090 -0,996 MOS 4,889 3,778 2,556 1,550

55

Results (II)

56

Conclusions

• Algorithms are not adapted to subjective human eye response.• Subjective tests revealed the importance of some concrete

regions.• Visual attention models adapted to visual attention obtain better

correlations when weighting regions of interest (ROI) and adapted to concrete artifacts.

• The use of visual attention models obtains improvement in objective metrics (Advanced Blur metric) up to 13% compared to conventional methods.

57

Con

clu

sion

s an

d F

utu

re W

ork

58

Conclusions• ZoC is affected by motion in the scene. The state-of-the-art

must be actualized to offer results with tests of dynamic sequences. Motion is a key factor in visual discomfort.

• Avoiding visual discomfort is possible locating objects in positive parallax, BUT that implies a decrease of QoE: – Negative parallax must be controlled to generate soft variations.– Positive parallax is recommended for its tolerance to visual discomfort and

the consequent.• Subjective tests revealed the importance of concrete ROI’s.• Visual attention models adapted to visual attention obtain better

correlations when weighting regions of interest (ROI) and adapted to concrete artifacts.

• The use of visual attention models obtains improvement in objective metrics (Advanced Blur metric) up to 13% compared to conventional methods.

59

Future work

• Development and patent of a system for automatization of quality of Experience for content generation (measuring visual discomfort).

• Developing recommendations and guidelines for 3D contents creators.

• Improvement of Visual attention model with more low, medium and high level features, such as color.

• Advanced metrics adapted to other artifacts, such as blocking.• Development of No-Reference metrics including visual attention

models.

60

Mer

its

61

Publications (I)Peer-reviewed international journal articles (1)

• López, J. P., Rodrigo, J. A., Jiménez, D., & Menéndez, J. M. (2013). Stereoscopic 3D video quality assessment based on depth maps and video motion. EURASIP Journal on Image and Video Processing, 2013(1), 1-14. December 2013. Impact Factor: 0.74. JCR Indexed.

Peer-reviewed international conference papers (9)• López, J. P., Rodrigo, J. A., Jimenez, D., & Menendez, J. M. Subjective quality assessment in

stereoscopic video based on analyzing parallax and disparity. Consumer Electronics (ICCE), 2015 IEEE International Conference on. Las Vegas (U.S.A.), January 2015.

• López, J. P., Rodrigo, J. A., Jimenez, D., & Menendez, J. M. Proposal for characterization of 3DTV video sequences describing parallax information. In Consumer Electronics (ICCE), 2015 IEEE International Conference on. Las Vegas (U.S.A.), January 2015.

• López, J. P., Slanina, M., Arnaiz, L., & Menéndez, J. M. Subjective quality assessment in scalable video for measuring impact over device adaptation. In EUROCON, 2013 IEEE (pp. 162-169). Zagreb (Croatia), July 2013.

• López, J. P., Rodrigo, J. A., Jimenez, D., & Menendez, J. M. Insertion of Impairments in Test Video Sequences for Quality Assessment Based on Psychovisual Characteristics. Artificial Intelligence, Modelling and Simulation, International Conference on. Madrid, November 2014.

• López, J. P., Rodrigo, J. A., Jimenez, D., & Menendez, J. M. Definition of masks related to psychovisual features for Video Quality Assessment. In Consumer Electronics (ISCE), 2015 IEEE International Symposium on (pp. 1-2). Madrid, June 2015.

62

Publications (II)

• López, J. P., Jimenez, D., Cerezo, A., & Menéndez, J. M. No-reference algorithms for video quality assessment based on artifact evaluation in MPEG-2 and H. 264 encoding standards. IFIP/IEEE International Symposium on. IEEE. Ganthe (Belgium), May 2013.

• Rodrigo, J. A., López, J. P., Jiménez Bermejo, D., & Menendez Garcia, J. M. (2013). Automatic 3DTV Quality Assessment Based On Depth Perception Analysis. Nem Summit 2013 Proceedings, 69-74. Nantes (France), October 2013.

• López, J.P., Jiménez, D., Díaz, M., & Menéndez, J.M. Metrics for the objective quality assessment in high definition digital video. IASTED International Conference on Signal Processing, Pattern Recognition and Applications (SPPRA). 2008.

• López, J.P., Díaz, M., Jiménez, D., & Menéndez, J. M. Tiling effect in quality assessment in high definition digital television. 12th IEEE International Symposium on Consumer Electronics- ISCE2008, ISBN: 978-1-4244-2422-1, Vilamoura, April 2008.

Book chapters (1)• López, J.P. Video Quality Assessment. Video Compression, Ed. InTech, ISBN: 978-953-51-

0422-3, March 2012.

Other peer-reviewed international conference papers (5)Peer-reviewed national journal articles (1)

63

Research projects• ACTIVA. Ministerio de Industria, Turismo y Comercio (FIT-330300-2007-42).• BUSCAMEDIA: hacia una adaptación semántica de medios digitales multirred-multiterminal. [2009-2012].• CIUDAD2020: Hacia un nuevo modelo de ciudad inteligente sostenible. [2011-2014].• COST Action IC1105: 3D-ConTourNet 3D Content Creation, Coding and Transmission over Future Media Networks.• EPSIS. Entretenimiento y publicidad segmentada en entornos inmersivos. Ministerio Economía y Competitividad [2011-

2013].• FURIA 2009. Futura red integrada audiovisual. Ministerio de Industria, Turismo y Comercio (TSI-020301-2009-33) [2009-

10]• HBB4ALL Hybrid Broadcast Broadband TV For All. [2013-2016]• HORFI-Radar MIMO de banda ultra ancha. TEC2012-38402-C04-01 HORFI. • ICT 2020. Ministerio de Industria, Turismo y Comercio (TSI-020302-2011-23). [2011-2013]• IMMERSIVE TV: Una aproximación a los medios inmersivos. Ministerio de Industria, Turismo y Comercio [2010-2012].• ITACA 3D. Plataforma de creación, producción y distribución de video estereoscópico de entretenimiento para la

visualización de televisión en 3D a través de briadcast. Ministerio de Industria, Turismo y Comercio (TSI-020110-2009-396).• MELISMAS - Generación automática de mensajes en lengua de signos para aplicaciones sanitarias. Ministerio de

Economía y Competitividad (RTC-2014-2762-1). [2014-16]• Palco HD. Convergencia de plataformas digitales hacia la HD y medidas de calidad asociadas. Ministerio de Industria,

Turismo y Comercio. [2007-2009]• PALCO HD2. Ministerio de Industria, Turismo y Comercio. [2009-2011].• PLEASE Plataforma de alta eficiencia avanzada para distribución de contenidos [2014-15].• PRO-TVD-CM PRO-TVD-CM: Proyecto Integral de Investigación en Televisión Digital (S0505/TIC-0398). [2005-2009]• S3D: Equipo servidor-editor de vídeo 3D realizado en colaboración con las empresas Overon y Aicox.• SIRENA: SIstemas y tecnologías 3D Media sobre Internet del Futuro y REdes de difusión de NuevA generación. Ministerio

de Economía y Competitividad (IPT-2011-1269-430000). [2011-2013]

64

Thanks for your attention!!

For more information:[email protected]

presentación tesis 08022016

Engineering