{"id":6071,"date":"2024-11-22T02:28:06","date_gmt":"2024-11-22T02:28:06","guid":{"rendered":"https:\/\/tech.newat9.com\/index.php\/2024\/11\/22\/how-openai-stress-tests-its-large-language-models\/"},"modified":"2024-11-22T02:28:06","modified_gmt":"2024-11-22T02:28:06","slug":"how-openai-stress-tests-its-large-language-models","status":"publish","type":"post","link":"https:\/\/tech.newat9.com\/index.php\/2024\/11\/22\/how-openai-stress-tests-its-large-language-models\/","title":{"rendered":"How OpenAI stress-tests its large language models"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>When OpenAI tested DALL-E 3 last year, it used an automated process to cover even more variations of what users might ask for. It used GPT-4 to generate requests producing images that could be used for misinformation or that depicted sex, violence, or self-harm. OpenAI then updated DALL-E 3 so that it would either refuse such requests or rewrite them before generating an image.\u00a0Ask for a horse in ketchup now, and DALL-E is wise to you: \u201cIt appears there are challenges in generating the image. Would you like me to try a different request or explore another idea?\u201d<\/p>\n<p>In theory, automated red-teaming can be used to cover more ground, but earlier techniques had two major shortcomings: They tend to either fixate on a narrow range of high-risk behaviors or come up with a wide range of low-risk ones. That\u2019s because reinforcement learning, the technology behind these techniques, needs something to aim for\u2014a reward\u2014to work well. Once it\u2019s won a reward, such as finding a high-risk behavior, it will keep trying to do the same thing again and again. Without a reward, on the other hand, the results are scattershot.\u00a0<\/p>\n<p>\u201cThey kind of collapse into \u2018We found a thing that works! We&#8217;ll keep giving that answer!\u2019 or they&#8217;ll give lots of examples that are really obvious,\u201d says Alex Beutel, another OpenAI researcher. \u201cHow do we get examples that are both diverse and effective?\u201d<\/p>\n<h3 class=\"wp-block-heading\">A problem of two parts<\/h3>\n<p>OpenAI\u2019s answer, outlined in the second paper, is to split the problem into two parts. Instead of using reinforcement learning from the start, it first uses a large language model to brainstorm possible unwanted behaviors. Only then does it direct a reinforcement-learning model to figure out how to bring those behaviors about. This gives the model a wide range of specific things to aim for.\u00a0<\/p>\n<p>Beutel and his colleagues showed that this approach can find potential attacks known as indirect prompt injections, where another piece of software, such as a website, slips a model a secret instruction to make it do something its user hadn\u2019t asked it to. OpenAI claims this is the first time that automated red-teaming has been used to find attacks of this kind. \u201cThey don\u2019t necessarily look like flagrantly bad things,\u201d says Beutel.<\/p>\n<p>Will such testing procedures ever be enough? Ahmad hopes that describing the company\u2019s approach will help people understand red-teaming better and follow its lead. \u201cOpenAI shouldn\u2019t be the only one doing red-teaming,\u201d she says. People who build on OpenAI\u2019s models or who use ChatGPT in new ways should conduct their own testing, she says: \u201cThere are so many uses\u2014we\u2019re not going to cover every one.\u201d<\/p>\n<p>For some, that\u2019s the whole problem. Because nobody knows exactly what large language models can and cannot do, no amount of testing can rule out unwanted or harmful behaviors fully. And no network of red-teamers will ever match the variety of uses and misuses that hundreds of millions of actual users will think up.\u00a0<\/p>\n<p>That\u2019s especially true when these models are run in new settings. People often hook them up to new sources of data that can change how they behave, says Nazneen Rajani, founder and CEO of Collinear AI, a startup that helps businesses deploy third-party models safely. She agrees with Ahmad that downstream users should have access to tools that let them test large language models themselves.\u00a0<\/p>\n<\/p><\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/www.technologyreview.com\/2024\/11\/21\/1107158\/how-openai-stress-tests-its-large-language-models\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>When OpenAI tested DALL-E 3 last year, it used an automated process to cover even more variations of what users [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":6072,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/tech.newat9.com\/index.php\/wp-json\/wp\/v2\/posts\/6071"}],"collection":[{"href":"https:\/\/tech.newat9.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tech.newat9.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tech.newat9.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tech.newat9.com\/index.php\/wp-json\/wp\/v2\/comments?post=6071"}],"version-history":[{"count":0,"href":"https:\/\/tech.newat9.com\/index.php\/wp-json\/wp\/v2\/posts\/6071\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tech.newat9.com\/index.php\/wp-json\/wp\/v2\/media\/6072"}],"wp:attachment":[{"href":"https:\/\/tech.newat9.com\/index.php\/wp-json\/wp\/v2\/media?parent=6071"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tech.newat9.com\/index.php\/wp-json\/wp\/v2\/categories?post=6071"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tech.newat9.com\/index.php\/wp-json\/wp\/v2\/tags?post=6071"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}