Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

2026 Participants: Martin Bartelmus * David M. Berry * Alan Blackwell * Gregory Bringman * David Cao * Claire Carroll * Sean Cho Ayres * Hunmin Choi * Jongchan Choi * Lyr Colin * Dan Cox * Christina Cuneo * Orla Delaney * Adrian Demleitner * Pierre Depaz * Mehulkumar Desai * Ranjodh Singh Dhaliwal * Koundinya Dhulipalla * Kevin Driscoll * Iain Emsley * Michael Falk * Leonardo Flores * Jordan Freitas * Aide Violeta Fuentes Barron * Erika Fülöp * Tiffany Fung * Sarah Groff Hennigh-Palermo * Gregor Große-Bölting * Dennis Jerz * Joey Jones * Titaÿna Kauffmann * Haley Kinsler * Todd Millstein * Charu Maithani * Judy Malloy * Eon Meridian * Luis Navarro * Collier Nogues * Stefano Penge * Marta Perez-Campos * Arpita Rathod * Abby Rinaldi * Ari Schlesinger * Carly Schnitzler * Arthur Schwarz * Haerin Shin * Jongbeen Song * Harlin/Hayley Steele * Daniel Temkin * Zach Whalen * Zijian Xia * Waliya Yohanna * Zachary Mann
CCSWG 2026 is coordinated by Lyr Colin-Pacheco (USC), Jeremy Douglass (UCSB), and Mark C. Marino (USC). Sponsored by the Humanities and Critical Code Studies Lab (USC), the Transcriptions Lab (UCSB), and the Digital Arts and Humanities Commons (UCSB).

Launch: LLMbench - a tool to undertake comparative annotated analysis of LLM output

After the success of the CCS workbench, I have used the lessons learned from that tool to vibe code another new tool I call LLMbench. This new tool allows you to prompt two LLMs simultaneously with the same prompt and then use the reply to analyse it with annotations. These can then be exported to JSON/text or PDF for later use.

In the examples below you can see I have compared Llama 3.2 and Gemini 2.5 Pro, but you could also compare two models from the same family (e.g. Pro vs Flash or even the same LLM Pro vs Pro). See below.

It's still in early development so I don't have a deployment to share, but you can install it yourself by following the instructions here.

This is what it currently looks like (it already has a dark mode!).

And here it is with annotation added:

Here are some examples of the same Gemini model family. In the first (Pro vs Pro) you can see the effects of the probabilistic generation in the slightly different outputs from the same prompt given to two instantiations of the model simultaneously.

The one above is Gemini Flash vs Pro.

Sign In or Register to comment.