Imagine being able to scan around an object with your smartphone and get a realistic, fully editable 3D model that you could view from any angle – this is quickly becoming a reality thanks to advances in AI.
Researchers at Simon Fraser University (SFU) in Canada have unveiled new AI techniques that aim to do just that: in the near future, consumers will be able to go beyond simply taking 2D photos and instead take 3D captures of real-world objects, freely editing their shape and appearance as easily as they do with regular 2D photos today.
In a new paper published in arXiv The researchers demonstrated a new technique, called Proximal Attention Point Rendering (PAPR), published on a preprint server and presented at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in New Orleans, Louisiana, that can convert a set of 2D photos of an object into a cloud of 3D points that represent the object’s shape and appearance.
Each point provides the user with a knob to control the object: dragging a point changes the object’s shape, and editing the point’s properties changes the object’s appearance. The 3D point cloud can then be viewed from any angle, in a process called “rendering,” converting it into a 2D photograph that displays the edited object as if it were photographed from a real angle.
The researchers demonstrated how they could bring the statue to life using a new AI technique that automatically converts a set of photos of the statue into a 3D point cloud and animates it, resulting in a video in which the statue shakes its head from side to side as the viewer is guided around the statue.
“AI and machine learning are driving a true paradigm shift in reconstructing 3D objects from 2D images. The remarkable success of machine learning in areas such as computer vision and natural language is motivating researchers to investigate how traditional 3D graphics pipelines can be redesigned using the deep learning-based building blocks that have enabled the rapid recent success of AI,” said Dr. Ke Li, Assistant Professor of Computer Science at Simon Fraser University (SFU), Director of the APEX Lab, and lead author of the paper.
“This turned out to be much harder to pull off than we expected, and several technical challenges needed to be overcome. What excites me most are the many possibilities this brings to consumer technology. 3D may become as prevalent a medium of visual communication and expression as 2D is today.”
One of the biggest challenges in 3D is how to represent 3D shapes in a way that is easy and intuitive for users to edit. An older approach called Neural Radiance Fields (NeRF) requires the user to describe what happens at each successive coordinate, making shape editing difficult. A more recent approach called 3D Gaussian Splatting (3DGS) is also not suitable for shape editing because the surface of the shape can shatter or fall apart after editing.
A key insight came when the researchers realized that rather than thinking of each 3D point in the point cloud as an individual splat, they could think of it as each control point in a continuous interpolator. As you move the points, the shape changes automatically, intuitively. This is similar to how an animator defines the movement of an object in an animated video: by specifying the object’s position at several points in time, the object’s movement at each point in time is automatically generated by the interpolator.
But how to mathematically define the interpolation between an arbitrary set of 3D points is not straightforward. Using a novel mechanism called proximity attention, researchers have created a machine learning model that can learn the interpolation end-to-end.
In recognition of this technological breakthrough, the paper was spotlighted at the NeurIPS conference, an honor given to the top 3.6% of papers submitted to the conference.
The research team is excited about what’s to come: “This opens the door to many different applications beyond what we’ve demonstrated so far,” says Dr. Li. “We’re already exploring different ways to leverage PAPRs to model moving 3D scenes, and the results so far are very promising.”
The paper’s authors are Yanshu Zhang, Shichong Peng, Alireza Moazeni and Ke Li. Zhang and Peng are co-first authors. Zhang, Peng and Moazeni are doctoral students in the School of Computing Science and are all members of Simon Fraser University’s (SFU) APEX Lab.
For more information:
Yanshu Zhang et al. “PAPR: Near-point of interest rendering” arXiv (2023). DOI: 10.48550/arxiv.2307.11086
Provided by Simon Fraser University
Quote: New AI technology enables 3D capture and editing of real-world objects (March 12, 2024) Retrieved July 4, 2024 from https://techxplore.com/news/2024-03-ai-technology-enables-3d-capture.html
This document is subject to copyright. It may not be reproduced without written permission, except for fair dealing for the purposes of personal study or research. The content is provided for informational purposes only.