ALEXANDRIA, Va., March 3 -- United States Patent no. 12,566,929, issued on March 3, was assigned to International Business Machines Corp. (Armonk, N.Y.).
"Combining data selection and reward functions for tuning large language models using reinforcement learning" was invented by Long Vu (Chappaqua, N.Y.), Nhan Huu Pham (Tarrytown, N.Y.), Dharmashankar Subramanian (Rye Brook, N.Y.) and Todd William Mummert (Danbury, Conn.).
According to the abstract* released by the U.S. Patent & Trademark Office: "A computer-implemented method, a computer program product, and a computer system for tuning large language models. A computer receives pairs of textual prompts and ground truth labels. A computer creates a data selection scoring function, by rep...