"Wenxin Yiyin looks like it was rushed on, I think this thing is not at all to make money, is to be able to catch the ChatGPT craze, the industry big model is what can really generate business value." Shortly after the release of Baidu Wenxin Yiyin, a former Baidu employee told Titanium Media, "Last year when OpenAI was not so hot, Mr. Wang (Baidu CTO Wang Haifeng) led the team to engage in 10 big models, including the industry big model, when there was not much attention outside the industry, but if you look at the layout of Baidu now, the industry big model is actually a forward-looking layout, earlier than OpenAI and Microsoft even earlier."
Now, after the general big model clamor, the industry model is gradually occupying the voice, is also proof of this reality: similar to ChatGPT and other basic big model earn is "yell", to a large extent is to play a role in educating the market, shaping the cognition, artificial intelligence really want to land, to earn money now, but also look at the industry big model.
Even in overseas markets, ChatGPT's attributes as part of a C-tier product have gradually waned in popularity - according to SimilarWeb data, the growth rate of ChatGPT's visits in the early period was amazing, with a 131.6% YoY growth rate in January, 62.5% in February, and 55.8% in March. 55.8%, slowed significantly in April with a 12.6% YoY growth rate, and by May, the number had changed to 2.8% and is expected to have a negative YoY growth rate in June.
"I believe many of us have tried ChatGPT, and I believe many of us have put it aside after trying it, because it is still largely disconnected from our work at the moment, so we use it and put it down. But I still hope people don't 'get up early and catch a late start,' because this is a paradigm revolution that will bring disruptive change." Microsoft (China) Chief Technology Officer (CTO) Wei Qing said previously.
The B-side solution built based on ChatGPT or the big model is a recipe for solving the fragmentation between the big model and the scenario.
Internationally, Microsoft, Amazon and other large manufacturers have also begun to seek commercialization paths to enterprise-class services and started to explore multiple industries; domestically, such as Baidu, Ali, Tencent, Huawei are all accelerating the industry big model investment at a fast pace. In addition, many global industry leaders and startups are also exploring the prospect of industry big models. Recently, Beijing Municipal Science Commission and Zhongguancun Management Committee also released the first batch of 10 AI industry big model application cases in Beijing. In addition, the amount of mergers and acquisitions of companies in related technology lines has climbed to new highs ......
Upgrade: Thousand-mode war
If the basic model is the "hundred model war", the industry big model is the "thousand model war", just like the trunk of the tree grows branches, each basic big model manufacturer can incubate several industry big models, and the action of the big manufacturers is tacit and consistent.
"Although people have high expectations for the universal big model, it is not necessarily the optimal solution to meet the needs of industry scenarios." On June 19, at the launch of Tencent Cloud Industry Big Model, Tang Daosheng, senior executive vice president of Tencent Group and CEO of Cloud and Intelligent Industry Group, said.
With no external release of the hybrid assistant, Tencent took the lead in releasing the industry big model, relying on the industry big model selection store built on the Tencent Cloud TI platform to provide customers with MaaS one-stop services to help enterprise customers build exclusive big models and intelligent applications. It is learned from Tencent that Tencent will release official news about the universal big model on the C side in the follow-up.
This series of initiatives may be interpreted as, for the time being, regardless of the effect and progress of the hybrid basic big models, prioritizing the release of industry big models is a necessary move for Tencent to ensure its own voice and seize market customers in the case of urgent customer needs.
Earlier, Huawei cloud artificial intelligence field chief scientist Tian Qi mentioned that Huawei divided the big model into three layers, L0, L1, L2, L0 is what we call the basic general model, like GPT-3, on the basis of the basic model L0, plus industry data, mixed training to get the industry big model is L1.
Then L1 is deployed for specific downstream scenarios of thousands of industries, and the task model L2 of the scenarios is obtained. In order to reduce production costs and improve efficiency as soon as possible, how to quickly produce L2 models from the industry large model L1, and how to deploy L2 models to the end-side, side-side and cloud-side, which is a very important issue.
As you can see on the agenda of the upcoming Huawei Developer Conference in July, Huawei Cloud will make a series of interpretations and releases on how the Pangu big model is refined from the basic big model to the industry big model.
In this year's Ali cloud summit, Ali cloud CTO Zhou Jingren also said, "today not all enterprises need to start training from scratch, nor do we need everyone to start from scratch to do a variety of corpus, including a large number of computing resources, from scratch to do a series of customization of the big model, we hope that today on top of the Tongyi thousand questions model, combined with the enterprise's scenario, the enterprise's knowledge system, the enterprise's industry-specific needs to produce an exclusive model of the enterprise."
Microsoft is also making its own industry big models. in April, in China, for local overseas enterprise users, Microsoft Azure OpenAI Service International Edition released the first three sets of Azure global innovative industry scenarios for retail e-commerce, manufacturing and digital native fields, integrating GPT-3, GPT-4, Codex, DALL-E and enterprise ChatGPT and other five large model services to help Chinese seafaring enterprise customers accelerate their global market expansion.
The "Thousand Model War" is about to start, but it's still too early to enter the big wave stage - overall, big models are still in a relatively early stage of development, and despite the concentration of big models in the industry, there is obviously more space in this track.
Take the big model of financial industry as an example, it is divided into different fields such as brokerage, insurance, banking and new finance, and the downstream tasks of each field are divided into dozens of hundreds of sub-tasks.
"The more important moment is the next time when the downstream tasks based on the basic model can be efficiently adapted to the downstream tasks through mechanisms such as SFT and built and scale up in the financial industry or other industry models." In the opinion of Chen Haiqing, head of the Innovation Business Center of Alibaba Dharma Institute, just the big industry models and scenarios that do continue training through some pervasive unstructured data are just the beginning.
A sensible and realistic choice
If an enterprise wants to do a large model with a base of 100 billion parameters, it needs more than 10,000 cards of arithmetic power in a single cluster, and not only GPU cards, but also cluster resources of GPUs to be utilized, which most companies cannot do.
And the industry big model is obviously easier to achieve, while also both broader application prospects.
"Big models empower a thousand industries, but the scenarios of a thousand industries should be very well understood, and you can't expect to train a hundred billion or trillion big models that business users take and use", said Zhou Ming, founder of Lanzhou Technology. "From the generic model to the industry model, to do the last mile for the user's scenario."
After evaluating the investment required for the basic big model and weighing the pros and cons against the gains and losses, enterprise customers quickly turned to the industry big model, and vendors put more energy into it.
Dawson Tang admits that the current general-purpose big model is generally based on extensive public literature and network information to training, online information may have errors, rumors, bias, many professional knowledge and industry data accumulation is insufficient, resulting in the model's industry-specific and accurate enough, the data "noise" is too large.
However, in many industrial scenarios, users demand high professional services from enterprises and low fault tolerance. Once a company provides wrong information, it may cause huge legal liability or public relations crisis. Therefore, the large models used by enterprises must be controllable, traceable and correctable, and must be repeatedly and fully tested before going online.
"We believe that customers need more industry-specific industry big models, plus the enterprise's own data for training or fine-tuning, in order to create intelligent services with high practicality. What enterprises need is a real solution to a problem in an actual scenario, not a solution to 70-80% of the problems in 100 scenarios." Dawson Tang said.
Zhu Yong, vice president of Baidu Intelligent Cloud, also said, "As you can see from the situation at home and abroad, there are not that many really doing generic models, and there are some vendors on the market that are actually doing smaller models. On the contrary, domain models are particularly important because generic models only have general knowledge capabilities, and domain models can be aligned with industry-specific and domain task expectations to solve practical problems in business, a process that is very important, but one that requires far less cost and resources than making the underlying generic model from scratch."
At the same time, he also judged that the future of the basic model (the underlying generic model) may be just a few, but combined with the data of professional fields, industry know how, the above will grow out of many different types of domain models, these domain models will be very prosperous in the future, supporting the upper layer of prosperous domain applications.
Take the energy industry big model "State Grid - Baidu - Wenxin" built by Baidu Intelligent Cloud and State Grid as an example, Baidu Intelligent Cloud and State Grid experts work together to introduce the sample data and unique knowledge accumulated by State Grid in the electric power business in the general big model line, and in the training, combine the experience of both sides in pre-training algorithms and business and algorithms in the electric power field. Baidu Intelligent Cloud and State Grid's experts work together to design algorithms such as entity discrimination in the electric power field and document discrimination in the electric power field as pre-training tasks, so that Wenxin's big model can learn electric power expertise in depth, thus truly solving actual business problems in the energy field and achieving the purpose of cost reduction and efficiency increase.
Zhu Yong said, the difference between generic model and domain model, you can compare the generic model to a person who went to university with a wide range of knowledge, he may know some medical knowledge, but can not make a diagnosis for the patient, not a professional doctor. The domain model, on the other hand, is a professional doctor who can contribute value in the medical field by learning medical knowledge in depth on the basis of strong generic ability.
The cost of resources required to go from a generic model with broad knowledge to a professional medical model is much less than building a large generic model from scratch, but it emphasizes the availability of professional data and the need to have professional domain tasks to drive it to generate such capability.
How to do the industry big model
The Big Model itself is a new thing that changes the previous software development paradigm, and vendors need a new set of tool chain and platform to help customers polish the industry Big Model earlier and faster.
With the arrival of the era of big models, the efficiency of the last mile will be significantly improved. Zhou Ming mentioned that a new generation of software development paradigm is taking shape, mainly based on the enterprise prompt to provide a lot of functional engines, users are now assistants can improve efficiency, and on this basis to think clearly and design their own user experience, it is easy to construct a new application.
Take Wenxin Qianfan big model platform as an example, is a one-stop big model development and service running platform for enterprise developers. It not only provides including Wenxin Yiyin underlying model (ERNIE-Bot) and third-party open source big model, but also provides various AI development tools and a whole set of development environment to facilitate customers to easily use and develop big model applications.
Such as data management, automated model SFT, and inference service cloud deployment, vendors hope to achieve one-stop big model customization services. Different vendors' big model building platform capabilities are basically similar, differing in terms of ease of use, good or bad results, and supported software and hardware.
"It's true that it's not cheap to make big models, but ultimately there are only two reasons why big model services can be promoted: the first is that the model effect should be good, and if the model effect is bad, nothing else needs to be said, and the second is the cost." Baidu Intelligent Cloud AI and big data platform general manager Xin Zhou said.
In effect, the industry model to rely on the general large model. The Bloomberg GPT, launched by Bloomberg and Johns Hopkins, is an example of the distribution of its data, with half of the generic base model data, half of the public data of the financial industry, and 0.6% of Bloomberg's own data.
"Any model to be able to reach a better level of intelligence or basic ability, must have to train the basic model in a better number of parameters, and then incorporate some industry professional data on the basic model to do the industry model." Xinzhou said.
Baidu's idea is to first launch a "big guy" (Wenxin Yiyin), a very complete tool platform (Wenxin Qianfan), and then provide differentiated model services based on the actual needs of customers, to help customers make the most cost-effective choice, they believe that the price will not become a bottleneck for enterprises to embrace the big model.
In addition to model call cost and training cost, Baidu is also helping enterprises to do further cost reduction. If enterprises just focus on their relatively narrow areas, Baidu also has relatively low parameter versions, so that the cost of using or training models will drop significantly while ensuring the model effect.
In fact, there is no universal standard for the cost of building a large industry model.
First of all, different basic big models have different parameter specifications, and the hardware and software inputs have to change dynamically according to the basic parameters and capabilities of the model. If it is 10 billion parameters, an A100 card can also run and it can start the downstream task.
The current more concentrated application scenario demands belong to this category, such as intelligent question and answer, intelligent writing, intelligent creation in the knowledge management category, as well as the demand for pan-internet marketing scenarios and code generation.
Secondly, the cost is related to the data volume and application direction. The current global large model pricing is billed with 1000 Token as the base unit. If an enterprise's downstream task is simple and only needs tens of thousands of tokens to do well, then its cost is very low and requires very few GPU cards. Whereas the amount of data needed to build a big industry model is usually measured in G or even in T, then its offline training cost will be very high.
"Wenxin Yiyin looks like it was rushed on, I think this thing is not at all to make money, is to be able to catch the ChatGPT craze, the industry big model is what can really generate business value." Shortly after the release of Baidu Wenxin Yiyin, a former Baidu employee told Titanium Media, "Last year when OpenAI was not so hot, Mr. Wang (Baidu CTO Wang Haifeng) led the team to engage in 10 big models, including the industry big model, when there was not much attention outside the industry, but if you look at the layout of Baidu now, the industry big model is actually a forward-looking layout, earlier than OpenAI and Microsoft even earlier."
Now, after the general big model clamor, the industry model is gradually occupying the voice, is also proof of this reality: similar to ChatGPT and other basic big model earn is "yell", to a large extent is to play a role in educating the market, shaping the cognition, artificial intelligence really want to land, to earn money now, but also look at the industry big model.
Even in overseas markets, ChatGPT's attributes as part of a C-tier product have gradually waned in popularity - according to SimilarWeb data, the growth rate of ChatGPT's visits in the early period was amazing, with a 131.6% YoY growth rate in January, 62.5% in February, and 55.8% in March. 55.8%, slowed significantly in April with a 12.6% YoY growth rate, and by May, the number had changed to 2.8% and is expected to have a negative YoY growth rate in June.
"I believe many of us have tried ChatGPT, and I believe many of us have put it aside after trying it, because it is still largely disconnected from our work at the moment, so we use it and put it down. But I still hope people don't 'get up early and catch a late start,' because this is a paradigm revolution that will bring disruptive change." Microsoft (China) Chief Technology Officer (CTO) Wei Qing said previously.
The B-side solution built based on ChatGPT or the big model is a recipe for solving the fragmentation between the big model and the scenario.
Internationally, Microsoft, Amazon and other large manufacturers have also begun to seek commercialization paths to enterprise-class services and started to explore multiple industries; domestically, such as Baidu, Ali, Tencent, Huawei are all accelerating the industry big model investment at a fast pace. In addition, many global industry leaders and startups are also exploring the prospect of industry big models. Recently, Beijing Municipal Science Commission and Zhongguancun Management Committee also released the first batch of 10 AI industry big model application cases in Beijing. In addition, the amount of mergers and acquisitions of companies in related technology lines has climbed to new highs ......
Upgrade: Thousand-mode war
If the basic model is the "hundred model war", the industry big model is the "thousand model war", just like the trunk of the tree grows branches, each basic big model manufacturer can incubate several industry big models, and the action of the big manufacturers is tacit and consistent.
"Although people have high expectations for the universal big model, it is not necessarily the optimal solution to meet the needs of industry scenarios." On June 19, at the launch of Tencent Cloud Industry Big Model, Tang Daosheng, senior executive vice president of Tencent Group and CEO of Cloud and Intelligent Industry Group, said.
With no external release of the hybrid assistant, Tencent took the lead in releasing the industry big model, relying on the industry big model selection store built on the Tencent Cloud TI platform to provide customers with MaaS one-stop services to help enterprise customers build exclusive big models and intelligent applications. It is learned from Tencent that Tencent will release official news about the universal big model on the C side in the follow-up.
This series of initiatives may be interpreted as, for the time being, regardless of the effect and progress of the hybrid basic big models, prioritizing the release of industry big models is a necessary move for Tencent to ensure its own voice and seize market customers in the case of urgent customer needs.
Earlier, Huawei cloud artificial intelligence field chief scientist Tian Qi mentioned that Huawei divided the big model into three layers, L0, L1, L2, L0 is what we call the basic general model, like GPT-3, on the basis of the basic model L0, plus industry data, mixed training to get the industry big model is L1.
Then L1 is deployed for specific downstream scenarios of thousands of industries, and the task model L2 of the scenarios is obtained. In order to reduce production costs and improve efficiency as soon as possible, how to quickly produce L2 models from the industry large model L1, and how to deploy L2 models to the end-side, side-side and cloud-side, which is a very important issue.
As you can see on the agenda of the upcoming Huawei Developer Conference in July, Huawei Cloud will make a series of interpretations and releases on how the Pangu big model is refined from the basic big model to the industry big model.
In this year's Ali cloud summit, Ali cloud CTO Zhou Jingren also said, "today not all enterprises need to start training from scratch, nor do we need everyone to start from scratch to do a variety of corpus, including a large number of computing resources, from scratch to do a series of customization of the big model, we hope that today on top of the Tongyi thousand questions model, combined with the enterprise's scenario, the enterprise's knowledge system, the enterprise's industry-specific needs to produce an exclusive model of the enterprise."
Microsoft is also making its own industry big models. in April, in China, for local overseas enterprise users, Microsoft Azure OpenAI Service International Edition released the first three sets of Azure global innovative industry scenarios for retail e-commerce, manufacturing and digital native fields, integrating GPT-3, GPT-4, Codex, DALL-E and enterprise ChatGPT and other five large model services to help Chinese seafaring enterprise customers accelerate their global market expansion.
The "Thousand Model War" is about to start, but it's still too early to enter the big wave stage - overall, big models are still in a relatively early stage of development, and despite the concentration of big models in the industry, there is obviously more space in this track.
Take the big model of financial industry as an example, it is divided into different fields such as brokerage, insurance, banking and new finance, and the downstream tasks of each field are divided into dozens of hundreds of sub-tasks.
"The more important moment is the next time when the downstream tasks based on the basic model can be efficiently adapted to the downstream tasks through mechanisms such as SFT and built and scale up in the financial industry or other industry models." In the opinion of Chen Haiqing, head of the Innovation Business Center of Alibaba Dharma Institute, just the big industry models and scenarios that do continue training through some pervasive unstructured data are just the beginning.
A sensible and realistic choice
If an enterprise wants to do a large model with a base of 100 billion parameters, it needs more than 10,000 cards of arithmetic power in a single cluster, and not only GPU cards, but also cluster resources of GPUs to be utilized, which most companies cannot do.
And the industry big model is obviously easier to achieve, while also both broader application prospects.
"Big models empower a thousand industries, but the scenarios of a thousand industries should be very well understood, and you can't expect to train a hundred billion or trillion big models that business users take and use", said Zhou Ming, founder of Lanzhou Technology. "From the generic model to the industry model, to do the last mile for the user's scenario."
After evaluating the investment required for the basic big model and weighing the pros and cons against the gains and losses, enterprise customers quickly turned to the industry big model, and vendors put more energy into it.
Dawson Tang admits that the current general-purpose big model is generally based on extensive public literature and network information to training, online information may have errors, rumors, bias, many professional knowledge and industry data accumulation is insufficient, resulting in the model's industry-specific and accurate enough, the data "noise" is too large.
However, in many industrial scenarios, users demand high professional services from enterprises and low fault tolerance. Once a company provides wrong information, it may cause huge legal liability or public relations crisis. Therefore, the large models used by enterprises must be controllable, traceable and correctable, and must be repeatedly and fully tested before going online.
"We believe that customers need more industry-specific industry big models, plus the enterprise's own data for training or fine-tuning, in order to create intelligent services with high practicality. What enterprises need is a real solution to a problem in an actual scenario, not a solution to 70-80% of the problems in 100 scenarios." Dawson Tang said.
Zhu Yong, vice president of Baidu Intelligent Cloud, also said, "As you can see from the situation at home and abroad, there are not that many really doing generic models, and there are some vendors on the market that are actually doing smaller models. On the contrary, domain models are particularly important because generic models only have general knowledge capabilities, and domain models can be aligned with industry-specific and domain task expectations to solve practical problems in business, a process that is very important, but one that requires far less cost and resources than making the underlying generic model from scratch."
At the same time, he also judged that the future of the basic model (the underlying generic model) may be just a few, but combined with the data of professional fields, industry know how, the above will grow out of many different types of domain models, these domain models will be very prosperous in the future, supporting the upper layer of prosperous domain applications.
Take the energy industry big model "State Grid - Baidu - Wenxin" built by Baidu Intelligent Cloud and State Grid as an example, Baidu Intelligent Cloud and State Grid experts work together to introduce the sample data and unique knowledge accumulated by State Grid in the electric power business in the general big model line, and in the training, combine the experience of both sides in pre-training algorithms and business and algorithms in the electric power field. Baidu Intelligent Cloud and State Grid's experts work together to design algorithms such as entity discrimination in the electric power field and document discrimination in the electric power field as pre-training tasks, so that Wenxin's big model can learn electric power expertise in depth, thus truly solving actual business problems in the energy field and achieving the purpose of cost reduction and efficiency increase.
Zhu Yong said, the difference between generic model and domain model, you can compare the generic model to a person who went to university with a wide range of knowledge, he may know some medical knowledge, but can not make a diagnosis for the patient, not a professional doctor. The domain model, on the other hand, is a professional doctor who can contribute value in the medical field by learning medical knowledge in depth on the basis of strong generic ability.
The cost of resources required to go from a generic model with broad knowledge to a professional medical model is much less than building a large generic model from scratch, but it emphasizes the availability of professional data and the need to have professional domain tasks to drive it to generate such capability.
How to do the industry big model
The Big Model itself is a new thing that changes the previous software development paradigm, and vendors need a new set of tool chain and platform to help customers polish the industry Big Model earlier and faster.
With the arrival of the era of big models, the efficiency of the last mile will be significantly improved. Zhou Ming mentioned that a new generation of software development paradigm is taking shape, mainly based on the enterprise prompt to provide a lot of functional engines, users are now assistants can improve efficiency, and on this basis to think clearly and design their own user experience, it is easy to construct a new application.
Take Wenxin Qianfan big model platform as an example, is a one-stop big model development and service running platform for enterprise developers. It not only provides including Wenxin Yiyin underlying model (ERNIE-Bot) and third-party open source big model, but also provides various AI development tools and a whole set of development environment to facilitate customers to easily use and develop big model applications.
Such as data management, automated model SFT, and inference service cloud deployment, vendors hope to achieve one-stop big model customization services. Different vendors' big model building platform capabilities are basically similar, differing in terms of ease of use, good or bad results, and supported software and hardware.
"It's true that it's not cheap to make big models, but ultimately there are only two reasons why big model services can be promoted: the first is that the model effect should be good, and if the model effect is bad, nothing else needs to be said, and the second is the cost." Baidu Intelligent Cloud AI and big data platform general manager Xin Zhou said.
In effect, the industry model to rely on the general large model. The Bloomberg GPT, launched by Bloomberg and Johns Hopkins, is an example of the distribution of its data, with half of the generic base model data, half of the public data of the financial industry, and 0.6% of Bloomberg's own data.
"Any model to be able to reach a better level of intelligence or basic ability, must have to train the basic model in a better number of parameters, and then incorporate some industry professional data on the basic model to do the industry model." Xinzhou said.
Baidu's idea is to first launch a "big guy" (Wenxin Yiyin), a very complete tool platform (Wenxin Qianfan), and then provide differentiated model services based on the actual needs of customers, to help customers make the most cost-effective choice, they believe that the price will not become a bottleneck for enterprises to embrace the big model.
In addition to model call cost and training cost, Baidu is also helping enterprises to do further cost reduction. If enterprises just focus on their relatively narrow areas, Baidu also has relatively low parameter versions, so that the cost of using or training models will drop significantly while ensuring the model effect.
In fact, there is no universal standard for the cost of building a large industry model.
First of all, different basic big models have different parameter specifications, and the hardware and software inputs have to change dynamically according to the basic parameters and capabilities of the model. If it is 10 billion parameters, an A100 card can also run and it can start the downstream task.
The current more concentrated application scenario demands belong to this category, such as intelligent question and answer, intelligent writing, intelligent creation in the knowledge management category, as well as the demand for pan-internet marketing scenarios and code generation.
Secondly, the cost is related to the data volume and application direction. The current global large model pricing is billed with 1000 Token as the base unit. If an enterprise's downstream task is simple and only needs tens of thousands of tokens to do well, then its cost is very low and requires very few GPU cards. Whereas the amount of data needed to build a big industry model is usually measured in G or even in T, then its offline training cost will be very high.
Share Dialog
Share Dialog

Subscribe to sheandher

Subscribe to sheandher
<100 subscribers
<100 subscribers
No activity yet