Warning: mkdir(): No space left on device in /var/www/hottg/post.php on line 59

Warning: file_put_contents(aCache/aDaily/2025-07-15/post/opendatascience/-2330-2331-): Failed to open stream: No such file or directory in /var/www/hottg/post.php on line 72
⚙️ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks. @Data Science by ODS.ai 🦜

TG Telegram Group & Channel

Data Science by ODS.ai 🦜 | United States America (US)

Create: 2025-05-29 Update: 2025-07-15 14:21:25

⚙️ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks.

Researchers built an automated system to collect and validate thousands of real-world tasks from GitHub, designed for training and evaluation of LLMs in software engineering.

Main features of the system:
1️⃣ Automatic data collection: Continuously extracts issue-PR pairs from Python repositories.
2️⃣ LLM-based environment setup: LLM analyzes repositories, creates install instructions, and updates them if errors happen.
3️⃣ Execution-based validation: Each task is tested by automatic setup, test run, and dependency freezing to make it reproducible.
4️⃣ LLM quality annotation: Tasks are labeled for clarity, difficulty, and test correctness to support filtering.

Result:
SWE-rebench dataset: 21,000+ ready-to-use interactive tasks.
Continuous updates: Fresh data is added regularly.
Transparent evaluation: Tasks are used for public SWE-rebench leaderboard.

🚀 SWE-rebench gives researchers and developers real and validated tasks to work with LLMs in SWE field.

Technical report: arXiv
Dataset: SWE-rebench

Data Science by ODS.ai 🦜

⚙️ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks.

Researchers built an automated system to collect and validate thousands of real-world tasks from GitHub, designed for training and evaluation of LLMs in software engineering.

Main features of the system:
1️⃣ Automatic data collection: Continuously extracts issue-PR pairs from Python repositories.
2️⃣ LLM-based environment setup: LLM analyzes repositories, creates install instructions, and updates them if errors happen.
3️⃣ Execution-based validation: Each task is tested by automatic setup, test run, and dependency freezing to make it reproducible.
4️⃣ LLM quality annotation: Tasks are labeled for clarity, difficulty, and test correctness to support filtering.

Result:
SWE-rebench dataset: 21,000+ ready-to-use interactive tasks.
Continuous updates: Fresh data is added regularly.
Transparent evaluation: Tasks are used for public SWE-rebench leaderboard.

🚀 SWE-rebench gives researchers and developers real and validated tasks to work with LLMs in SWE field.

Technical report: arXiv
Dataset: SWE-rebench

🔥2👍1

hottg.com/opendatascience/2330

3.17K viewsMay 29 at 15:03

>>Click here to continue<<

Data Science by ODS.ai 🦜

Share with your best friend

Telegram is riding high, adding tens of million of users this year. Now the bill is coming due.Telegram is one of the few significant social-media challengers to Facebook Inc., FB -1.90% on a trajectory toward one billion users active each month by the end of 2022, up from roughly 550 million today.

⚙️ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks.

Data Science by ODS.ai 🦜 TG
Webview: 2330
Telegram TG Webview: hottg.com/opendatascience/webview
Telegram TG Channel: Data Science by ODS.ai 🦜
Telegram Updated:
Warning: filemtime(): stat failed for aCache/aDaily/2025-07-15/post/opendatascience/-2330-2331- in /var/www/hottg/post.php on line 338
1970-01-01 00:00:00

United States America Popular Telegram Group (US)

Warning: Undefined array key 3 in /var/www/hottg/function.php on line 115

Fatal error: Uncaught mysqli_sql_exception: Can't create/write to file '/tmp/#sql-temptable-a06e-3e072b-64d.MAI' (Errcode: 28 "No space left on device") in /var/www/hottg/function.php:216 Stack trace: #0 /var/www/hottg/function.php(216): mysqli_query() #1 /var/www/hottg/function.php(115): select() #2 /var/www/hottg/post.php(351): daCache() #3 /var/www/hottg/route.php(63): include_once('...') #4 {main} thrown in /var/www/hottg/function.php on line 216