Test-time Compute

#Pocket#NLP#LanguageModel#LLM-as-a-Judge
Issue Date: 2025-03-27 Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators, Seungone Kim+, arXiv25 Comment元ポスト:https://x.com/jinulee_v/status/1905025016401428883?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QLLM-as-a-JudgeもlongCoT+self-consistencyで性能が改善するらしい。![image](https ... #Pocket#NLP#LanguageModel
Issue Date: 2025-03-18 Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification, Eric Zhao+, arXiv25 Comment元ポスト:https://x.com/ericzhao28/status/1901704339229732874?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qざっくりしか読めていないが、複数の解答をサンプリングして、self-verificationをさせて最も良かったものを選択するア ... #Pocket#NLP#LanguageModel
Issue Date: 2025-02-12 Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling, Runze Liu+, arXiv25

#Pocket#NLP#LanguageModel
Issue Date: 2025-02-10 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, Jonas Geiping+, arXiv25 #Pocket#NLP#LanguageModel#Finetuning (SFT)
Issue Date: 2025-02-07 s1: Simple test-time scaling, Niklas Muennighoff+, arXiv25 Comment解説:https://x.com/hillbig/status/1887260791981941121?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pocket#NLP#LanguageModel#Reasoning
Issue Date: 2025-01-28 Evolving Deeper LLM Thinking, Kuang-Huei Lee+, arXiv25 #Efficiency/SpeedUp#Pocket#NLP#LanguageModel
Issue Date: 2024-11-12 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, Charlie Snell+, arXiv24 Comment![image](https://github.com/user-attachments/assets/0562a65e-b2f1-4ff4-b806-107313fc255e)[Perplexity(参考;Hallucinationに注意)](https://www.perplexity.ai/s ... #Article#Tutorial#NLP#LanguageModel#Article#Reasoning
Issue Date: 2025-03-09 The State of LLM Reasoning Models, Sebastian Raschka, 2025.03 #Article#Pocket#LanguageModel#Article
Issue Date: 2024-12-17 Scaling test-time-compute, Huggingface, 2024.12 Commentこれは必読 ... #Article#NLP#LanguageModel#Chain-of-Thought
Issue Date: 2024-09-13 OpenAI o1, 2024.09 CommentJason Wei氏のポスト:https://x.com/_jasonwei/status/1834278706522849788?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q#1072 や #1147 で似たような考えはすでに提案されていたが、どのような点が異なるのだろうか? たと ...