포스트

[Paper Review] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

1. BackGround

1) 3D Representation

Depth MapVoxel GridPointCloud
alt textalt text
alt text
alt text
Distance between the camera and pixelsRepresented as a 3D GridRepresented as a “collection” of points
(Volume: X, Location: O)
(+) Can utilize 2D images(+) Conceptually easy to understand(+) Can represent structures
$\quad$ with a small number of points
(-) Difficult to perform 3D tasks(-) Requires a lot of memory for detailed
$\;\;\;$ representation.
$\;\;\;$(3D Kernel $\rightarrow$ 3D CNN usage)
(-) Cannot represent surfaces
(-) Needs a new type of Loss
$\;\;\;$ (Because it’s a “collection” of points)
MeshImplicit Surface
alt textalt text
A “collection” of triangles made up of “vertices” and “faces”
● Vertices: Corners of the triangles
● Face: Surface of the triangle
A method of representing 3D shapes as functions
(+) Mainly used in Computer Graphics
(+) Adaptive representation is possible by using
$\quad$ more faces where detailed representation is needed
(+) Can also represent additional information such as
$\quad$ color and texture using things like UV Maps
(+) Can represent detailed surfaces
(-) Not easy to handle in Neural Networks
$\quad$ (Graph Convolution)
(-) Requires understanding of the concept

2) Previous Work

alt text


2. Proposal

1) Challenges

1) Unordered

Network must invariant to N! Permutation

It means that Network must always predict the same output, even though the order of point cloud is changed


2) Interaction among Points

Model needs to be able to capture Local/Global structures

Even though point cloud is kind of sets, It must be able to interact with surrounding points


3) Invariance under transformations

Learned representation of the point set should be invariant to certain transformation.

It means that Network must always predict the same output, even though the point cloud is changed with rigid transformation
※ rigid transformation: rotation, translation

2) Solution(1) - Symmetric Function

Target Challenges

UnorderedInteraction among pointsInvariance under transformations
Symmetric Function__

Method1: Sort by Canonical order (X)

We can sort point clouds before entering them into the network

$\rightarrow$ but we can’t make stable network with point perturbation.


Method2: RNN with augment (X)

we can use augmentation that allows RNN model to consider all permutatiopn orders

$\rightarrow$ but we can’t use if we have thousands of input


Method3: Use Symmetric Function (O)

we can approximate our network to a symmetric funtion

Theory(Hausdorff Distance)Application
alt text$f({x_1, …, x_n}) \approx g(h(x_1), …, h(x_n))$
$f: 2^{\mathbb{R}^N} \rightarrow \mathbb{R}$,
$h: \mathbb{R}^N \rightarrow \mathbb{R}^K$,
$g: \mathbb{R}^K \times … \times \mathbb{R}^K \rightarrow \mathbb{R}$
$g$ is a symmetric function
\[\Downarrow\]

alt text

Author stated that using maxpooling as a symmetric function is the best way, because it can make our network learn the critical points of point clouds

alt text

3) Solution(2) - Aggregation

Target Challenges

UnorderedInteraction among pointsInvariance under transformations
Symmetric FunctionAggregation_

3) Architecture


3. Experiment

1) 3D Object Part Segmentation

2) Semantic Segmentation in Scenes

3) 3D Object Detection

4) Robustness Test

이 기사는 저작권자의 CC BY 4.0 라이센스를 따릅니다.