The 'AI-for-Code', generic name of the technologies that use artificial intelligence to partially or fully automate software development it is, in the words of the IBM Research researchers, "on the verge of shifting from a proof of concept to a widely adopted technology."
In order to boost this trend, these researchers have launched a project that they hope will serve as a turning point: the CodeNet Project.
The objective of this one is provide a valuable dataset to the community as a whole researchers in the field of AI applied to programming; said dataset would allow them apply machine learning techniques to the creation of translators between different programming languages, or code generators or analyzers.
For this, the dataset (distributed under a free license) brings together a total of 14 million code examples in which the solution to 4053 common programming problems ... posed from 55 different programming languages- From currently popular languages (like Go, Python, and Java) to classics (like COBOL, FORTRAN, and Pascal).
But these source code samples do not come from applications in production or in the development process, as this would not only have prevented their free use, but also have a very important element in the training of AI models: examples of badly programmed code, properly labeled so that artificial intelligence can 'learn' the difference between functional and failed code.
For it, They turned to two popular programming competitions in Japan as the source of the code. (Aizu and AtCoder), whose participants had the mission of generating code that would convert an X data input into a Y output.
The goals set for CodeNet
Now, IBM's goal is to make CodeNet able to follow in the footsteps of ImageNet, the image database and reference labels when training computer vision applications, and so that CodeNet becomes the leader in the field of AI applied to programming.
Last December, Ruchir Puri, chief scientist in IBM's research division, explained in a podcast how they were using this kind of artificial intelligence to modernize legacy software, helping to migrate monolithic applications to microservices for IBM business customers. Now, IBM says that CodeNet will allow other scientific teams to equip themselves with the same technology.
These kinds of advances will make it possible to leave behind software inherited for decades that, due to its characteristics, is difficult to translate and that keeps certain languages alive in corporate and bureaucratic environments that otherwise would have already been abandoned:
"If the translation of programming languages were easy and rule-based systems worked, the first programming languages like COBOL would have already been translated [a otros más modernos].
But programming languages have context and identifying it to be able to do the translation, as with human languages, is complicated and time-consuming. […] Context is challenging for AI. "
Via | IBM