When it comes to AI in 3D modeling, the key challenges lay in the heavy domain knowledge required to post-process shapes delivered by Artificial Intelligence.
While using neural networks in 2D image recognition is currently one of the leading ML applications, 3D processing is still waiting for its big moment. The key challenges are in the heavyweight data that comes with every 3D shape and the need to post-process the generated shapes to make them applicable. So in fact, despite existing automation, there is still a good deal of human, manual work required to deliver a good-looking and usable 3D shape, be that a plane, a chair, or a car.
This delivers strict limitations on the usage of neural networks in 3D analysis and processing – while Artificial Intelligence (AI) may be able to detect cancer on medical scans with superhuman accuracy, delivering the same effect with a 3D model remains a matter of the future.
With recent research, scientists from Wroclaw University of Science and Technology together with our team delivered a neural network that automates the creation of 3D objects (a mesh) from input data. Contrary to popular approaches, this neural network delivers a fully explainable result.
How is it done today?
Previous approaches to processing 3D with AI used the supervised learning paradigm that required data scientists to preprocess and prepare the dataset for the machine to learn on. Considering the availability of data and the immense amount of work required, the approach is troublesome to say the least.
Also, doing all this work manually is tedious and expensive, requiring extensive domain knowledge, many man hours and heavy computing power. So delivering the key to automation appears to be a great and not yet fully explored field of research.
The cutting-edge approaches used before were still a black box – although results were available, neither the user nor the researcher had a clue on how they had been delivered. And an unexplainable AI process always bears a degree of risk when an uncommon combination of factors delivers a surprising glitch.
Our research consists of delivering two neural networks. The first is the encoder, which transforms the 3D object into a multidimensional vector. By doing so the shape starts to be comprehensible for a neural network, which is basically an extremely complex tool for performing computations. So even the most complicated numerical value can only be the type of input it accepts.
The encoder network can be considered the machine learning equivalent of a retina, which transforms the input of light into neural impulses in the same way the encoder network encodes the 3D shape into a vector. A positive side effect of the encoder network is the extreme reduction of the input size, from a full-fledged 3D shape into a string of numbers. Due to this transformation, any transmission can take place much more efficiently.
The second step is to process the input in the decoder network.
Back to the primitive
Using the input delivered by the encoder neural network, the decoder delivers a decision tree that shows how the shape was received using a set of primitives.
A “primitive” is a term used in 3D geometry to describe the most basic 3D figure, one which is impossible to reduce further like a cube, cuboid, oval, or a sphere. The neural network delivered by the Tooploox research team sees primitives as a basic set of building blocks that are used to reconstruct the shape decoded by the decoder network.
The network manipulates the sizes and positions of primitives to deliver more sophisticated and complex shapes. From the network’s point of view, the most basic representation of a house is constructed of a cube, a pyramid, and a cuboid that is inserted into the “roof” to represent a chimney.
Detailed information on how the network constructs a particular shape can be seen in the graph below:
During the project the network was tasked to put together the shapes of a plane, a couch with a set of pillows, a ship, a TV, a bench, a table and an armchair.
With using a set of primitives as building blocks, apart from reducing the computational effort required to deliver a 3D mesh of an object, comes a value not to be overlooked – explainability.
Unboxing the black box
A common assumption about neural networks is that the creator has little to no clue about the process behind the output. This is only a part of the truth. The creator always has information about the source dataset and the paradigms he or she used when designing the network.
However, the processing done inside the network is usually covered in a shadow. It is not always necessary to know precisely what the reasons behind this or that decision of the network were, as long as the result is satisfying.
On the other hand, though, the lack of explainability can be dangerous when 100% certainty is required, for example in critical operations or healthcare. The neural network described in the paper not only delivers the 3D shape of an object but also instructions on how it was done, which primitives have been used, where and what has been modified, including operations like size modification, folding, or cutting particular shapes.
Possible usage in the real world
This approach is currently in the research stage, with a limited application range. But undoubtedly there is a plentiful number of possible applications, especially considering the rising role of 3D modeling and AI-generated 3D models.
The encoder network can be tuned to use various types of 3D data input models, for example, lidar point cloud information, 3D scanners like CT or MRI machines or other, more sophisticated sources that can capture a 3D object from a 2D image. In such a situation, the network will do all the tedious work of delivering a basic shape and enable the 3D artist to focus on details, in fact, delivering automatic 3D models from photos. Objects captured in this way can be inserted into video games to create a more immersive experience or better visuals.
Using primitives instead of a “solid” 3D shape makes engineering applications easier. The engineer gets step-by-step instructions on the blocks used and can modify them further at will, replacing particular parts or inserting new ones.
In healthcare, this can be used to deliver scanned models for prosthetic devices, printed after a 3D scan of the patient’s own body. For example, a limb that needs a joint replacement.
Finally, the model can be re-tuned to use additional types of building blocks to deliver a 3D visualization of an object and instructions for building it. So, with a good deal of engineering, one can build objects using a larger catalog of parts and shapes. This application could find usage in furniture and design, as the producer can experiment with owned sets of prefabricated shapes to deliver new products. Or maybe even in AI generated architecture.
Assuming one wishes to have a lot of fun – the model can be re-tuned to deliver instructions on how to build a 3D object using Lego bricks. Unfortunately, it would only be a shape, without all the cool gears, pumps, levers, and all the stuff we love Legos for.
The research paper was delivered by a team consisting of Kacper Kania, Maciej Zięba and Tomasz Kajdanowicz from the Wrocław University of Science and Technology. Maciej Zięba is also a part of the Tooploox Research and Development team.
The results provided in the paper were presented during the NeurIPS 2020 conference in the main track. More details can be found in the publication.