Swe Fficiency Benchmarking Llm Code Speedups

SWE-fficiency: Benchmarking LLM Code Speedups

In this AI Research Roundup episode, Alex discusses the paper: '

Multi-SWE-bench: Testing LLMs on Real-World Code Issues

In this episode of the AI Research Roundup, host Alex discusses a new

Meet SWE-Perf: Benchmarking LLMs for Real-World Code Performance Optimization @ the Repository Level

SWE

SWE-chat: New Dataset of Real LLM Coding Sessions

In this AI Research Roundup episode, Alex discusses the paper: '

SWE-Bench+: Enhanced Coding Benchmark for LLMs (October 2024)

Title:

What are Large Language Model (LLM) Benchmarks?

Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdKetJ Learn more about the ...

Prompt Optimization Changes Which LLM is Best

I tested Gemma 3 4B vs Ministral 8B on an intent classification task with the same prompt. Gemma 3 4B won. Then I optimized the ...

LLMs Synthesize High-Speed Optimization Code

In this AI Research Roundup episode, Alex discusses the paper: 'Distribution-Aware Algorithm Design with

What do AI Benchmarks Actually Mean?! A Fast Breakdown (MMLU, SWE-bench, & More Explained)

Ever see a headline like 'New AI smashes MMLU

I Tested 6 LLMs with MULTIPLE Runs for Same Prompt

If we launch the same prompt on the same

Faster LLMs: Accelerate Inference with Speculative Decoding

Ready to become a certified watsonx AI Assistant Engineer? Register now and use

What Do LLM Benchmarks Actually Tell Us? (+ How to Run Your Own)

Interpreting and running standardized language model

Souper-Model: Smarter LLM Model Souping

In this AI Research Roundup episode, Alex discusses the paper: 'Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art ...

THIS is the REAL DEAL 🤯 for local LLMs

This is the stack that gets me over 4000 tokens per second locally. Download Docker Desktop here: https://dockr.ly/4mOdGMO to ...

ProgramBench: New Coding Benchmark for LLM Agents

In this AI Research Roundup episode, Alex discusses the paper: 'ProgramBench: Can Language Models Rebuild Programs From ...

LLM Evaluation & Benchmarks

MMLU, HumanEval, and the art of measuring intelligence. How do we actually measure

What is vLLM? Efficient AI Inference for Large Language Models