[Paper Review] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

게시 2024/10/04

By UI-JIN KIM 2 분읽는 시간

1. BackGround

1) 3D Representation

Depth Map Voxel Grid PointCloud

Distance between the camera and pixels Represented as a 3D Grid Represented as a “collection” of points
(Volume: X, Location: O)
(+) Can utilize 2D images (+) Conceptually easy to understand (+) Can represent structures
$\quad$ with a small number of points
(-) Difficult to perform 3D tasks (-) Requires a lot of memory for detailed
$\;\;\;$ representation.
$\;\;\;$(3D Kernel $\rightarrow$ 3D CNN usage) (-) Cannot represent surfaces
(-) Needs a new type of Loss
$\;\;\;$ (Because it’s a “collection” of points)
Mesh Implicit Surface
A “collection” of triangles made up of “vertices” and “faces”
● Vertices: Corners of the triangles
● Face: Surface of the triangle A method of representing 3D shapes as functions
(+) Mainly used in Computer Graphics
(+) Adaptive representation is possible by using
$\quad$ more faces where detailed representation is needed
(+) Can also represent additional information such as
$\quad$ color and texture using things like UV Maps (+) Can represent detailed surfaces
(-) Not easy to handle in Neural Networks
$\quad$ (Graph Convolution) (-) Requires understanding of the concept

Depth Map	Voxel Grid	PointCloud

Distance between the camera and pixels	Represented as a 3D Grid	Represented as a “collection” of points (Volume: X, Location: O)
(+) Can utilize 2D images	(+) Conceptually easy to understand	(+) Can represent structures $\quad$ with a small number of points
(-) Difficult to perform 3D tasks	(-) Requires a lot of memory for detailed $\;\;\;$ representation. $\;\;\;$(3D Kernel $\rightarrow$ 3D CNN usage)	(-) Cannot represent surfaces (-) Needs a new type of Loss $\;\;\;$ (Because it’s a “collection” of points)

Mesh	Implicit Surface

A “collection” of triangles made up of “vertices” and “faces” ● Vertices: Corners of the triangles ● Face: Surface of the triangle	A method of representing 3D shapes as functions
(+) Mainly used in Computer Graphics (+) Adaptive representation is possible by using $\quad$ more faces where detailed representation is needed (+) Can also represent additional information such as $\quad$ color and texture using things like UV Maps	(+) Can represent detailed surfaces
(-) Not easy to handle in Neural Networks $\quad$ (Graph Convolution)	(-) Requires understanding of the concept

2) Previous Work

2. Proposal

1) Challenges

1) Unordered
Network must invariant to N! Permutation
It means that Network must always predict the same output, even though the order of point cloud is changed
2) Interaction among Points
Model needs to be able to capture Local/Global structures
Even though point cloud is kind of sets, It must be able to interact with surrounding points
3) Invariance under transformations
Learned representation of the point set should be invariant to certain transformation.
It means that Network must always predict the same output, even though the point cloud is changed with rigid transformation
※ rigid transformation: rotation, translation

2) Solution(1) - Symmetric Function

Target Challenges

Unordered	Interaction among points	Invariance under transformations
Symmetric Function	_	_

Method1: Sort by Canonical order (X)
We can sort point clouds before entering them into the network
$\rightarrow$ but we can’t make stable network with point perturbation.
Method2: RNN with augment (X)
we can use augmentation that allows RNN model to consider all permutatiopn orders
$\rightarrow$ but we can’t use if we have thousands of input
Method3: Use Symmetric Function (O)
we can approximate our network to a symmetric funtion
Theory(Hausdorff Distance) Application
$f({x_1, …, x_n}) \approx g(h(x_1), …, h(x_n))$
$f: 2^{\mathbb{R}^N} \rightarrow \mathbb{R}$,
$h: \mathbb{R}^N \rightarrow \mathbb{R}^K$,
$g: \mathbb{R}^K \times … \times \mathbb{R}^K \rightarrow \mathbb{R}$
$g$ is a symmetric function
\[\Downarrow\]
Author stated that using maxpooling as a symmetric function is the best way, because it can make our network learn the critical points of point clouds

3) Solution(2) - Aggregation

Target Challenges

Unordered	Interaction among points	Invariance under transformations
Symmetric Function	Aggregation	_

[Paper Review] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

1. BackGround

1) 3D Representation

2) Previous Work

2. Proposal

1) Challenges

1) Unordered

2) Interaction among Points

3) Invariance under transformations

2) Solution(1) - Symmetric Function

Method1: Sort by Canonical order (X)

Method2: RNN with augment (X)

Method3: Use Symmetric Function (O)

3) Solution(2) - Aggregation

3) Architecture

3. Experiment

1) 3D Object Part Segmentation

2) Semantic Segmentation in Scenes

3) 3D Object Detection

4) Robustness Test

인기 태그

Theory(Hausdorff Distance)	Application
	$f({x_1, …, x_n}) \approx g(h(x_1), …, h(x_n))$ $f: 2^{\mathbb{R}^N} \rightarrow \mathbb{R}$, $h: \mathbb{R}^N \rightarrow \mathbb{R}^K$, $g: \mathbb{R}^K \times … \times \mathbb{R}^K \rightarrow \mathbb{R}$ $g$ is a symmetric function