Take one example about the impacts to artists:
... If this continues, human made art, and the skills associated, will eventually no longer exist. Once this happens, AI image generators will no longer improve as well as machine learning will not have new data to train on, thus repeating the same styles and images over and over again.
Actually, the machines likely continue to learn from previously AI-generated contents, create new combinations of styles and images, that cycle can go forever as the number of combinations are mathematically unlimited.
So, without human artists and human musicians, next generation of children will absorb algorithm-generated arts and music, along with AI-generated children's books, and later on, in school age, be taught by AI tutors. In other words, Today, we (human) provide data to train AI; in the future, AI will train human.
For other public comments on the advancement of AI, check out my summary page from subset of NTIA RFC's public submissions.
With traditional search process, users type in a question, a search engine would return some URLs, the users would have to click through and read several URL-referenced pages, as the right answers might not be in the first URL, not even in any particular single page, the information could be scattered in several web pages.
When GPT-3 first came out, I saw an opportunity to use LLM to retrieve and read these search engine URLs, find relevant information, and compose answers based on them – this completes the last mile of the user search journey, by providing direct answers to users’ questions.
My experiment project (below video) worked as expected, effectively used LLM to interact with users, at the same time, avoided GPT-3 knowledge cutoff date, and to some extent, avoided hallucinations.
There are many such products coming out since then, such as Perplexity (an answer engine), all major search engines also have AI generated answers.
AI alignment is to make the model behave in line with human intentions and values, with forward and backward alignment.
But there might be another way - the alignment of source data. Do we have good representative sample of data? do we need some alignment/adjustment for original data? Take Boston housing data as an example, there are many factors that could determine the housing prices. Let’s assume the intended goal is that the percentage of black population in the neighborhood should not affect the predicted housing price.
Here is my measurement, given a trained model with original data, query the model with various combinations of parameters to predict housing prices, and then change only black percentage, see below results. Take first row as an example, 58% black population, the AI predicted housing price is $13,178, but if all other conditions are equal, only change black population to 30%, the AI model would predict price to be $14,214 -- a 7.9% up-lift.
Black pop% | Predicted price | Predicted price if 30% black pop | Variance |
---|---|---|---|
58.0% | $13,178.60 | $14,214.90 | 7.9% |
49.3% | $14,029.20 | $14,906.70 | 6.3% |
40.4% | $15,960.90 | $16,525.70 | 3.5% |
31.7% | $21,658.40 | $21,765.10 | 0.5% |
15.3% | $23,480.40 | $22,324.00 | -4.9% |
9.2% | $16,107.70 | $14,353.40 | -10.9% |
6.6% | $17,961.20 | $15,924.10 | -11.3% |
4.8% | $10,971.50 | $8,738.19 | -20.4% |
So the question is, do we want to measure the quality of data, and adjust/align the data before feeding into a large model for training ?
In human society, we debate vigorously what materials should be in textbook and school libraries, as these materials are shaping up the mind of our next generation. For AI training data, if the practice is anything goes, with backward alignment - providing feedback and adjustment when a result produced is undesirable, it might not be able to fix all the problems. The equivalence in human society is that the society did not adequately tell kids certain things are wrong, but use legal punishment when the offense committed, in hoping this feedback would teach the kid not to do it again....