The research conducted by researchers from Tooploox and the Group of Machine Learning Research at Jagiellonian University delivered a neural network-based solution to boost the efficiency of point cloud data processing as well as transforming it into a 3D object called a mesh.
The International Conference on Machine Learning is one of the most renowned academic conferences on Artificial Intelligence and Machine Learning research. Researchers and scientists from around the world submit their papers to be published during the conference, with only 21.8% among thousands being awarded an opportunity to present their findings to a wide audience.
The event has a long and venerable tradition, with the first edition being held in 1980 in Pittsburgh. The following editions have been held in cities like Haifa, Israel (2010); Edinburgh, Scotland (2012); Beijing, China (2014); and Lile, France (2015).
This time, members of the Group of Machine Learning Research at Jagiellonian University in Krakow, teaming up with our AI experts at Tooploox, will present their research on the Hypernetwork approach to generating point clouds.
What is a point cloud?
To make it as straightforward as possible – a point cloud is a cloud of points – a 3D swarm of dots, that can appear random when separated but, when seen together, represent a particular shape in space, be that a cube, a sphere, a table, a chair, or a car. Point cloud data is usually produced by various kinds of 3D point cloud scanners.
This can appear confusing when described, yet point cloud data finds multiple use cases in the real world. Placing motion tracking dots on actors’ faces enables FX artists to track c their expressions and transfer them onto computer-generated beings. Point cloud software gave us the (slightly disturbing) transformation of the spandex-clad Benedict Cumberbatch into the great dragon Smaug himself.
Another real-life use of point cloud data is seen in a LIDAR (Laser-baser radar) which scans the surroundings of an autonomous or semi-autonomous vehicle to deliver accurate and real-time data about obstacles or nearby objects. The LIDAR point cloud data is used also in the 3D mapping done with drones and similar devices to deliver more accurate geospatial data.
The LIDAR itself delivers the information in the form of a swarm of points. Again – it is not an object per se, rather a specific type of 3D data representing an object.
Point cloud processing today
Point cloud data is useful in multiple applications, yet there are several limitations to the technology. First and foremost – it is heavy, with the need to work on 3D raw data. Thus, rendering and processing require a good deal of computing power.
Also, the size and complexity of performed computations result in making the process time-consuming. When it comes to rendering the facial expressions of a dragon, it is not that painful. Yet when it comes to building autonomous vehicles that need to use LIDAR to avoid collisions, it can be a matter of milliseconds.
That’s why Tooploox has looked for a way to optimize the processing point cloud data using neural networks.
What does our research change?
In this particular case, the role of neural networks was in the act of reforging the heavy and computationally-challenging data into a simplified form, good enough to be processed efficiently, but delivering better results when it comes to time and power required.
Extreme compression of point cloud data
The research team delivered a neural network that encoded the 3D point cloud data into multidimensional vector representation. The vector itself is a form of feature extraction that is good enough to deliver accurate information about an object, yet cuts out the information that is seen as irrelevant. By that, the overall paradigm of point cloud processing software has been changed dramatically.
The nature of a neural network makes the outcome incomprehensible for a human, yet good enough for the machines to process efficiently. The whole process is shown in the image below:
The encoding network analyzes the input data and delivers the vector. The encoding network can use the vector given to compare it with a database of known objects and interpret what entity it represents.
Applying the method described in the research makes a 48-fold compression possible. The process described above was possible due to the research about Hypernetworks done earlier by a team consisting of David Ha, Andrew Dai, and Quoc V. Le.
The key application of the research can be found in the swift processing of this type of data. A good example can be found in autonomous vehicles, be that a car or a drone. Transforming the income into a vector and then processing only the vectors significantly reduces the computing power and memory required to use this type of data. Thus, more processing can be done on the edge device and the need for a transfer can be significantly reduced.
Transforming point cloud into mesh
The research also covered a way to transform the point cloud into a 3D object without the need to perform heavy computations. The core of the solutions is based heavily on the neural network processing the point cloud data. But this time, there is an additional network involved in the process.
The encoder network transforms the data into a multidimensional vector representation in the same way as in the example above. Also, it is used by the decoder network, where the vector representation is used to produce weights necessary for the target network, a new entity in the process, to deliver a mesh from a point cloud.
Meshing used to be a computational-heavy process with a solid phase of post-processing required. In this approach, the mesh is delivered by transforming a model of a 3D ball which is then further molded by the target network into the proper shape. The process is shown in the image below:
It can be compared to a clay modeling – transforming a smooth ball of clay into a chair, a car, or a mug – you name it. Meshes are widely adopted in civil engineering to show CAD models and work on them during the design phase.
It is worth noting that the neural network has not been trained on meshes. So it is one of these rare situations in the machine learning process when the network delivers an effect that is different from the input data.
This time the process involves delivering an outcome which is comprehensible to a human. The nature of the research paper indicates that we are eager to discover ways to use this for our benefit.
Yet it is relatively easy to imagine the next Martian rover, comparable to Curiosity, which will use a LIDAR system to scan the surface, mountains, and caves. The next step would be to encode the message and send it back to earth – due to the reduced size, it would be easier and more affordable than sending full point cloud data.
Finally, the Earth-based team would decode the message and get a 3D mesh of the surface of Mars to examine it further or merge with photo images to deliver a 3D model of the landscape. In the end, maybe a 3D laser scanning will enable us to take a short VR-powered walkabout around Isidis Plantitia – because, why not?
Also, the delivered mesh can be of any definition one wishes, depending only on the input mesh of the target network. So it can be used to deliver an HD mesh of nearly any surface scanned. Imagine delivering a giant 3D mesh of the human digestive system, so a medical student could travel through it for educational purposes. Or even to deliver an HD mesh of molecular-level structures – because, again, why not?
The work is the joint effort of Przemysław Spurek, Sebastian Winczowski, and Jacek Tabor from Group of Machine Learning Research at Jagiellonian University in Krakow, Poland in cooperation with Maciej Zamorski and Maciej Zięba from the Wrocław University of Science and Technology and Tomasz Trzciński from the Warsaw University of Technology, both of whom also work as AI experts at Tooploox.
Scan the QR code to download the paper or click this link