Research

How I built a living literature review from chat

I was writing a section on tool-use in LLM agents and needed to track about 40 papers. My system was a Google Doc with links I'd paste in, plus notes I'd forget to update. Half the time I'd re-read a paper I'd already summarized because I couldn't find my notes.

I started using ArXiv Watcher as my research memory. Every paper I discussed, it tracked. Every summary, it remembered. When I needed to write, I'd ask for a recap of everything we'd covered on a topic.

The agent

Same ArXiv Watcher. The key feature here is the research log. As you discuss papers over days and weeks, it builds a searchable memory of everything. You can ask 'What papers have we discussed about chain-of-thought?' and get a consolidated answer.

Setting it up

Same 7-minute deploy process:

Find the agent

Head to the Hiregents marketplace and find ArXiv Watcher. Read the skill file if you want to know exactly what it can do. Everything is transparent.

Sign in

Click Deploy. You'll get a magic link to your email. No passwords, no account setup. Click the link and you're in.

Pick a plan

Hobby ($39/mo) or Pro ($49/mo). Both get you a private server. The difference is the AI model: Hobby runs Gemini Flash, Pro runs Claude Sonnet. Annual billing saves 25%.

Connect your keys

Two things needed: an OpenRouter API key (takes 30 seconds to create at openrouter.ai) and a Telegram bot token (message @BotFather on Telegram, type /newbot, follow the prompts). Both get validated in real-time before you can continue.

Review and deploy

Check the summary: agent name, plan, email. Hit Deploy. You'll go through Stripe checkout and land on a status page.

Wait about 5 minutes

The status page shows a progress circle while your server boots. Docker pulls, security gets configured, the agent installs its tools. You don't need to stay on the page. It'll be ready when you come back.

Open Telegram

Find your bot on Telegram (the one you created with BotFather). Send /start. ArXiv Watcher is live.

What the conversation looks like

12:34

ArXiv Watcher

online

Today

I need papers on how LLM agents handle tool failures — when an API call errors out or returns garbage

12:34

ArXiv Watcher

Found 3 papers directly on this: 1. "ReAct-Retry" (Li et al., 2026) — Reflection-based retry. Agent analyzes why the tool call failed, adjusts parameters, retries. 23% fewer cascading failures. 2. "Robust Tool Use" (Park & Kim, 2026) — Trains agents on intentionally noisy tool outputs. Agents learn to validate responses before using them. 3. "Fallback Chains" (Zhang, 2025) — Hierarchical tool selection. If primary tool fails, agent selects alternative from a ranked fallback list. Added all 3 to your research log under "Tool Failure Handling."

12:34

What have we covered so far on tool use in general?

12:34

ArXiv Watcher

Your research log on tool use (12 papers across 3 sessions): • Tool Discovery: 3 papers (ToolFormer 2, API-Scout, DocTools) • Tool Selection: 4 papers (ToolBench, TaskMatrix, API-Bank, ToolLLM) • Tool Failure: 3 papers (just added above) • Tool Composition: 2 papers (ChainForge, ToolChain) Want the full summaries for any of these groups?

12:34

Message

What I got

A literature review that built itself as I worked. No separate note-taking, no lost links. When I sat down to write, I asked for a recap by topic and had structured notes for 40+ papers organized into themes. The agent did the librarian work.

Try it yourself

Deploy ArXiv Watcher View agent profile