Imagine sweeping around an object on your smartphone and getting a fully editable, realistic 3D model that you can view from any angle. This is rapidly becoming a reality thanks to advances in AI.
Researchers at Simon Fraser University (SFU) in Canada have announced a new AI technology to do just that. Soon, the everyday consumer will not only be able to just take his 2D photo, but he will also be able to take his 3D capture of a real object and view its shape and shape as easily as he takes a regular 2D photo of him today. You will be able to freely edit the appearance.
In a new paper presented at the Conference on Neural Information Processing Systems (NeurIPS), the annual flagship international conference on AI research, held in New Orleans, Louisiana, researchers describe how proximal attention point rendering (PAPR) and We have demonstrated a new technology called Convert her 2D photo of an object into a cloud of 3D points representing the object's shape and appearance. Each point provides the user with a knob to control the object. Dragging a point changes the object's shape, and editing point properties changes the object's appearance. Then, in a process known as “rendering,” the 3D point cloud is viewed from any angle and transformed into a 2D photo that shows the edited object as if it were actually photographed from that angle.
Researchers have shown how to bring statues to life using new AI technology. The technology automatically converts a series of photos of a statue into a 3D point cloud and animates it. The end result is a video of the statue turning its head from side to side as the viewer is guided through the path around it.
AI and machine learning are really driving a paradigm shift in reconstructing 3D objects from 2D images. The remarkable success of machine learning in fields such as computer vision and natural language has led researchers to use the same deep learning-based building blocks that have led to the recent AI runaway to create traditional 3D graphics. We encourage you to investigate how you can redesign your pipeline. '' said Dr. Ke Li, assistant professor of computer science and director of the APEX Lab at Simon Fraser University (SFU), and senior author of this paper. “It turns out this is much harder to pull off than expected and requires overcoming several technical challenges. What excites me most is the many challenges this brings to consumer technology. It's possible: 3D could become as common a medium for visual communication and expression as 2D is today.”
One of the biggest challenges in 3D is how to represent 3D shapes in a way that allows users to easily and intuitively edit them. The previous approach, known as Neural Radiance Field (NeRF), requires the user to describe what happens to every continuous coordinate and does not allow for easy editing of the shape. A more recent approach known as 3D Gaussian splatting (3DGS) is also not well suited for shape editing, as the surface of the shape can become shattered or disjointed after editing.
A key insight came when researchers realized that instead of thinking of each 3D point in a point cloud as a separate splat, they could think of each as a control point in a continuous interpolator. Then, as you move the points, the shape changes automatically and intuitively. This is similar to how animators define the motion of objects in animated videos. By specifying the position of an object at several points in time, the interpolator automatically generates the motion at each point.
However, mathematically defining an interpolator between arbitrary 3D point sets is not straightforward. Researchers have formulated a machine learning model that can learn an interpolator in an end-to-end manner using a new mechanism known as proximity attention.
In recognition of this technological leap forward, the paper received attention at the NeurIPS conference. This honor is given to the top 3.6% of papers submitted to the conference.
The research team is excited about what lies ahead. “This opens the door to many applications beyond what we have demonstrated,” Dr. Lee said. “We are already exploring different ways to leverage PAPR to model moving 3D scenes, and the results so far have been very promising.”
The paper's authors are Yanshu Zhang, Shichong Peng, Alireza Moazeni, and Ke Li. Zhang and Peng are co-first authors, and Zhang, Peng and Moazeni are doctoral students in the Department of Computer Science, and all are members of Simon's APEX Lab at Fraser University (SFU). Click here to learn more about the research.