1 | patched by llm providers | | | | | | | 2 | 0.27% |
2 | to generate harmful content | | | | | | | 2 | 0.27% |
3 | llm attacks paper overview | | | | | | | 1 | 0.13% |
4 | add adversarial suffix | | | | | | | 1 | 0.13% |
5 | someones identity dangerous social | | | | | | | 1 | 0.13% |
6 | identity dangerous social media | | | | | | | 1 | 0.13% |
7 | dangerous social media steal | | | | | | | 1 | 0.13% |
8 | social media steal from | | | | | | | 1 | 0.13% |
9 | media steal from charity | | | | | | | 1 | 0.13% |
10 | add adversarial suffix user | | | | | | | 1 | 0.13% |
11 | bomb steal someones identity | | | | | | | 1 | 0.13% |
12 | adversarial suffix user ai | | | | | | | 1 | 0.13% |
13 | suffix user ai ethics | | | | | | | 1 | 0.13% |
14 | user ai ethics and | | | | | | | 1 | 0.13% |
15 | ai ethics and disclosure | | | | | | | 1 | 0.13% |
16 | ethics and disclosure this | | | | | | | 1 | 0.13% |
17 | research — including the | | | | | | | 1 | 0.13% |
18 | — including the methodology | | | | | | | 1 | 0.13% |
19 | steal someones identity dangerous | | | | | | | 1 | 0.13% |
20 | user question build a | | | | | | | 1 | 0.13% |
21 | the methodology described in | | | | | | | 1 | 0.13% |
22 | please note that these | | | | | | | 1 | 0.13% |
23 | we assessed them as | | | | | | | 1 | 0.13% |
24 | assessed them as being | | | | | | | 1 | 0.13% |
25 | them as being of | | | | | | | 1 | 0.13% |
26 | relatively little harm however | | | | | | | 1 | 0.13% |
27 | little harm however please | | | | | | | 1 | 0.13% |
28 | harm however please note | | | | | | | 1 | 0.13% |
29 | however please note that | | | | | | | 1 | 0.13% |
30 | note that these responses | | | | | | | 1 | 0.13% |
31 | select user question build | | | | | | | 1 | 0.13% |
32 | that these responses do | | | | | | | 1 | 0.13% |
33 | contain content that may | | | | | | | 1 | 0.13% |
34 | content that may be | | | | | | | 1 | 0.13% |
35 | that may be offensive | | | | | | | 1 | 0.13% |
36 | may be offensive select | | | | | | | 1 | 0.13% |
37 | be offensive select user | | | | | | | 1 | 0.13% |
38 | offensive select user question | | | | | | | 1 | 0.13% |
39 | including the methodology described | | | | | | | 1 | 0.13% |
40 | described in the paper | | | | | | | 1 | 0.13% |