The AI "Hundred Model War" has started, who can be the king?

"Based on the feedback now, no big model, including ChatGPT, has been able to meet all the standards on task performance tests." This is basically the industry's consensus on the mushrooming of AI big models.

After the launch of ChatGPT, the same type of products based on big language model technology are still emerging at an accelerated pace. Since April, from Internet majors, to A-share listed companies, and a host of startups, plus universities and research institutes, have released their own big models, totaling more than 30.

In just a few months, several models have emerged, what is the strength of each model? Is there a surplus of big models in China? Is the final game of the industry a blossoming one or a winner takes all? In the discussion of this technical change, which are the most critical and most likely to bring long-term impact among the new capabilities brought by big models?

AI big models are mushrooming, and there is no "perfect answer" for task performance testing.

According to the statistics of Minsheng Securities, there are more than 30 big models in China, and the industry is like a scene of "100 model war".

According to the feedback from AI industry players interviewed by Science and Technology Board Daily, the industry has not yet given any direct definite judgment on specific models, but they have provided some dimensions for the outside world to make reference.

In an interview with Science and Technology Board Daily, Yu Kai, co-founder and chief scientist of Spectrum and professor of Shanghai Jiaotong University, said that a fact that must be acknowledged is that only ChatGPT has passed the generality test (users broke 100 million) of the current big models, and there are still gaps between the domestic big models and them.

Kai Yu told reporters that the first way to measure the strength of a big model is to test the performance based on tasks, that is, by defining the task set to compare the completion degree of all big models on each task. This performance test is aligned with human abilities, including comprehension, reasoning, judgment, etc. According to the current feedback, none of the big models, including ChatGPT, can meet all the standards in the task performance test.

Second, judging from the security perspective, this is more reflected in the degree of coupling between large models and human values.

The third is the model operation perspective, which is judged from the engineering characteristics. "This is a particularly important capability." Yu Kai stressed, such as how much text the big model can receive, the speed of answer response, and the performance of operation.

Focusing on technical metrics, as Yu Kai puts it. Of course, there are also those who make judgments from the level of resource endowment.

Wang Jun (a pseudonym), a senior industry expert in the field of big model, told reporters that the requirements for the team to do big model are very high, and there should be no shortcomings in many aspects such as funding, technology, engineering, product, commercialization, etc. The final test is: whether the core members have really thought clearly about the general direction and rhythm, whether they can get enough resources and support, whether they can attract key talents from various aspects to join, and whether they can grind a group of different backgrounds after attracting After attracting a group of cattle people with different backgrounds, can they grind well together.

"One of the most scarce is the core algorithm research and platform engineering technical talents, and there are not many of them in the whole Chinese circle." Wang Jun stressed.

The battle for talent, which has been very distinctly portrayed in the take-off phase of the big model market.

The "pre-emptive" Baidu, sent to the helm is CTO Wang Haifeng, the team of entrepreneurs, Lanzhou Technology's Zhou Ming, Zhou Bowen, etc., their influence in the artificial intelligence industry has been needless to say. Previously, the high-profile official announcement of the artificial intelligence venture of Wang Huiwen, the first step into the game is to send out hero posts in its personal social media platform, heavy (75% of the shares of the new company) to recruit top R & D talent.

"The criteria for judging whether it's done well or not can't depend on each company's own propaganda, and some industry-recognized evaluation benchmarks can certainly be used as a reference, but the most important thing is user recognition, and the users, especially those with the most high-frequency or paid users, are the best." Wang Jun stated.

Approaching the core of AGI, "generalization" of industrial applications is the key

Due to various commercial reasons, it is difficult for the outside world to know all the actual data, test feedback indicators, resources invested, and even user data of each company's large model, so it is difficult to make a scientific judgment on its strength.

However, the reporter noticed that the interviewees all mentioned an obvious evaluation angle, that is, "user feedback", such as the response speed, accuracy, usability, contextual logic, etc. This is why every time there is a test, the user feedback is not only the most important factor, but also the most important factor. That's why, whenever a new big model is launched, users will first pay attention to whether the answer will "flop" or not.

In terms of several representative big models in China, the reporters of Science and Technology Board Daily have had practical experience with them, and combined with the feedback from many users, the overall characteristics of the big models are as follows:

ChatGPT-4 is a multimodal large language model, which supports image and text input, and outputs in the form of text.

In contrast, domestic large models are diverse and have different capabilities, and are currently more focused on exploring industrial applications and solving industrial technology barriers.

In terms of Chinese semantics, domestic models, including Wenxin Yiyin and QianYiTongQin, have high and low comprehension capabilities, and do not significantly distance themselves from each other. In terms of understanding tricky Chinese phrases, the training data of domestic big models mainly comes from Chinese corpus, compared with ChatGPT which mainly comes from English corpus, so the domestic big models are better.

However, there are some cases. Professor Qiu Xipeng's team at Fudan University released the first ChatGPT-like model in China, MOSS, which has a higher level of English responses than Chinese because English, as the mainstream language for scientific research, is widely used in academia and industry and has accumulated a large amount of high-quality corpus data, and the English data is highly open source compared to the Chinese data.

In addition, MOSS is designed with human ethical guidelines in mind and will not produce biased or potentially harmful responses, which will avoid some potential legal risks and business ethics issues to a certain extent. This point is not explicitly addressed by ChatGPT.

The sound of big model measurement is still on the rise. However, Yu Kai admits that it is not the right time to judge the ability of each model and how good or bad it is.

In his opinion, the big models that have been released now, the change is that they have basically realized the emergence of the thinking chain, approaching the core part of AGI (General Artificial Intelligence), and the industry is now more concerned about whether the big models have enough "generalization", that is, widespread use, but from the industrial point of view, the domestic users of big models have not yet reached the level of But from the industry's point of view, the volume of users of domestic large models has not yet reached ubiquity.

But from the industry's point of view, the volume of users of domestic large models has not yet reached ubiquity. "Before reaching widespread ubiquity, it is still necessary to be cautious in judging by the standard of ubiquity." Yu Kai emphasized.

The first mover is not necessarily the "king" of large models

With multiple models flooding the market at the same time in the short term, there are also views that there is a need for so many big models now. In other words, is there a surplus of big models now?

The industry is generally of the opinion that although there are so many big models now, it is far from a surplus.

Wang Jun believes that the technology, capital and strength requirements of large models for manufacturers can only be said to be just usable for the current products.

Yu Kai said that the future industry AI application paradigm will change from a generic model to a cluster of generic models, and the big model will be differentiated, such as differentiated by domain, differentiated by function, and combined with specific scenarios of specific industries. Now, the very professional and profound big models are not yet out, and these will keep emerging in the future.

Chen Yunwen, founder and CEO of Dagan Data, also said in an interview with Science and Technology Board Daily that the big model track is currently in the exploration catch-up stage in China, and the technology itself is not yet mature, so there is still a lot of room for growth in the future, "just like the beverage brands now, if we stretch the timeline to see, the number of big models today is actually not much. "

So in the future, the endgame of the industry, whether it is a hundred flowers, or winner takes all?

Yu Kai and Chen Yunwen both said that the future will be a blossoming state, the reason is that it is now in the initial stage, the future of the big model in each industry will grow out of the product form will be different, and for different industries, there will also be vertical industry application model.

Wang Jun said that the high threshold of the big model determines that this is something that only a few players can do, and the future will not be a hundred flowers, but whether it will be like a search engine, or only two or three operating systems, or the pattern of the development of cloud computing, "it is impossible to judge, or some variables."

A number of senior industry insiders told Science and Technology Board Daily that OpenAI is only a temporary "first mover" in the big model of artificial intelligence. It is still extremely difficult to tell which company will be able to achieve a global market position similar to Apple's today with its big AI model in the future. There are too many cases where the first mover is not the last king. Take China's Internet development as an example, the first ones to start are Sina, Sohu and Netease, but the ones who really cut the big cake later are Tencent, Ali and Byte.

To pay more attention to and think deeply about the application scenarios of AI

Whether you are amazed at the amazing performance of ChatGPT or thinking about the "Hundred Model War", the essence of all the discussions about big models today is nothing but the opportunities and challenges brought to human beings by this technological revolution.

However, in the opinion of Zhou Feng, CEO of NetEase Youdao and doctor of computer science, there is still one issue that has not been fully discussed in this technology trend, that is, which are the most critical and most likely to bring long-term impact among the new capabilities brought by big models.

According to Zhou Feng, compared with many previous natural language processing technologies, big language models have at least three fundamental new capabilities: emergent capabilities, the ability to support multiple applications as a base model, and the ability to support dialogue as a unified portal.

Feng Zhou says that emergent capabilities are important not only because they are new capabilities that have emerged since the emergence of big models, but also because most of the capabilities that emerge from big models are very important ones. For example, the ability of common sense reasoning has been a major challenge in AI, and the emergence of big models has led to significant progress in common sense reasoning. For example, once 'reasoning' capabilities emerge, 'thought chain hinting' strategies can be used to solve multi-step reasoning challenges. "Thus, the emergence of emergent capabilities is a fundamental change brought about by the Big Model.

In terms of base models, Zhou Feng said that large models can not only shorten the development cycle of each specific application and reduce the required human investment, but also get better application results based on the reasoning, common sense and writing capabilities of large models. Therefore, large models can become a grand unified base model for AI application development, which is a one-two punch and a new paradigm that deserves to be strongly promoted.

The opportunity to make the large language model really explode this round is ChatGPT, which is based on conversational chat. Zhou Feng said that although there were various problems with previous chatbots, the emergence of the large language model once again allows chatbots as an interaction model to be re-imagined. The future may see the emergence of many similar projects that allow assistants to perform various specific tasks in a conversational format.

"These three capabilities have been widely discussed in academia and are even considered common sense, but lack sufficient attention in industry and product teams." According to Zhou Feng, "The characteristics of these big model technologies have changed the way we think about business and product planning, and will change the economic model of many products. Therefore, product managers and business leaders need to pay more attention to and think deeply about the application scenarios of these new capabilities."

Yu Kai also believes that in this technical change, in addition to the technical level of the parameter magnitude, the resource level of algorithms, computing power, data, talent funding and other discussions, more attention needs to be paid to, in addition to the large language model, other things related to the artificial intelligence system.

These would include the understanding of language, the understanding of conversational AI, and the understanding of multimodal AI. These are not just questions of algorithms alone, but also about business, training strategies, and understanding of the technical journey of AI.

AI big models are mushrooming, and there is no "perfect answer" for task performance testing.

According to the statistics of Minsheng Securities, there are more than 30 big models in China, and the industry is like a scene of "100 model war".