Building an AI yield concierge in 5 days: HexKit meets LI.FI Earn
Building an AI yield concierge in 5 days: HexKit meets LI.FI Earn
Most yield dashboards give you a table of vaults sorted by APY. You scroll, you filter, you compare numbers, you open three tabs to check protocol safety, and eventually you pick something. Multiply that by every idle token in your wallet and you've burned half an hour before depositing anything.
I wanted something different. Type "pool my assets into the top 3 safest vaults" and let the system figure out what that means, find the right vaults across every chain, and build the deposit transactions.
That's the pitch: natural language in, cross-chain yield execution out. Five days to build it for LI.FI's DeFi Mullet Hackathon.
What HexKit already was
HexKit is a web3 developer toolkit I'd been building before the hackathon. Transaction simulation, token approvals, contract interactions, wallet management. The kind of toolbelt a DeFi power user reaches for when they need to understand what a transaction actually does before signing it.
The LI.FI integration was new. Everything described in this post was built during the five-day hackathon window (April 8-14, 2026).
The integration surface
LI.FI Earn has two APIs that matter:
Earn Data API (earn.li.fi): vault discovery, chain/protocol metadata, portfolio positions. No auth required, 100 req/min rate limit. Paginated, so fetching the full vault universe means walking a cursor until nextCursor comes back empty.
Composer API (li.quest): the transaction builder. You give it a source token, a destination vault, and an amount. It returns calldata for a swap+bridge+deposit in a single transaction. This is what makes cross-chain yield actually work. A user holding ETH on Arbitrum can deposit into a USDC vault on Optimism without manually bridging or swapping anything.
Three layers. The UI layer is vault browsing, deposits, withdrawals, and position tracking. The AI concierge layer sits between the user and the vault universe: it parses intent, ranks candidates, calls the LLM for recommendations, and queues execution. Below that, the external APIs: LI.FI Earn, LI.FI Composer, Gemini, and HexKit's own EDB simulation engine.
The vault list: where browsing starts
Before the AI stuff, I needed the basics. A paginated vault browser with chain and protocol filters, sortable by APY or TVL, with a minimum TVL floor selector. The Earn Data API handles server-side filtering and sorting, so this was mostly a matter of wiring React Query to paginated fetches.
export async function fetchEarnVaults(params?: {
cursor?: string;
chainId?: number;
sortBy?: string;
minTvlUsd?: number;
}): Promise<EarnVaultsResponse> {
const url = new URL(`${EARN_PROXY}/v1/earn/vaults`, window.location.origin);
if (params?.cursor) url.searchParams.set("cursor", params.cursor);
if (params?.chainId) url.searchParams.set("chainId", String(params.chainId));
// ...
const res = await fetch(url.toString(), {
signal: AbortSignal.timeout(15000),
});
return res.json();
}Each vault card shows the protocol, chain, underlying tokens, APY breakdown (base vs. reward), TVL, and a risk classification. The risk classification is where things got interesting.
Risk filtering: the part nobody wants to build
Sorting by APY and showing the top results is a recipe for recommending garbage. A vault with 4,000% APY and $12K TVL is not an opportunity — it's either a honeypot or a rounding error in reward emissions. So I built two tiers of risk classification that run client-side on every vault.
High-risk: APY above 250% AND TVL below $2M. These are the outliers that look amazing in a sorted list but are almost never worth the risk.
Caution: four independent signals, any one of which flags the vault.
function getCautionReasons(vault: EarnVault): CautionReason[] {
const reasons: CautionReason[] = [];
const { apy, tvl } = vault.analytics;
// More than 60% of yield comes from reward tokens
if (apy.reward / apy.total > 0.6) reasons.push("reward-heavy");
// 1-day APY is 3x the 30-day average — likely a spike
if (apy1d > apy30d * 3) reasons.push("apy-spike");
// Declining trend: 30d > 7d > 1d, and 1d is less than half of 30d
if (apy30d > apy7d && apy7d > apy1d && apy1d < apy30d * 0.5)
reasons.push("declining-yield");
// TVL under $250K
if (tvlUsd < 250_000) reasons.push("micro-tvl");
return reasons;
}Reward-heavy vaults are the sneakiest. The headline APY looks great, but 60%+ of it comes from governance token emissions that can vanish when the protocol adjusts incentives. APY spikes are easier to spot: if 1-day yield is 3x the 30-day average, something temporary happened. Declining yield is the opposite problem, where the 30-day number in the listing doesn't reflect what you'll actually earn because the trend line is falling.
Both high-risk and caution vaults are filtered out of primary recommendations by default. Users can opt them back in with toggle pills — but they have to make a conscious choice.
Deposits: the happy path and the fallback
The deposit flow uses LI.FI's Composer API. You pick a source token (which can be on any chain — the Composer handles the swap and bridge), enter an amount, and the system builds the transaction.
type FlowState =
| "idle"
| "quoting"
| "simulating"
| "approving"
| "swapping"
| "executing"
| "success"
| "error";The flow walks through these states in order. Before executing, HexKit's EDB simulation engine dry-runs the transaction to verify the spender address. If the Composer returns calldata that approves tokens to an unexpected address, the simulation reverts with an allowance error and we catch it before the user signs anything.
function classifySpenderCheck(result: AssetMovementResult): SpenderCheckResult {
if (result.success) return { status: "already" };
const reason = (result.error ?? "").toLowerCase();
const allowancePatterns = [
"allowance", "erc20", "transferfrom",
"insufficient allowance", "exceeds allowance",
];
if (allowancePatterns.some((p) => reason.includes(p))) {
return { status: "verified", revertReason: result.error };
}
return { status: "suspicious", revertReason: result.error };
}A "verified" revert means the simulation reached the transferFrom call and failed on allowance, which confirms the spender is the right contract. "Suspicious" means it reverted for a different reason. The user can still proceed, but they see a warning.
Sometimes the Composer can't build a direct route, usually because the vault's underlying token doesn't have liquid swap pairs on the source chain. When that happens, the deposit flow falls back to two steps: swap the source token to the vault's underlying token first, then deposit. Two transactions instead of one, but it beats "route not found."
The idle sweep: finding money you forgot about
The idle yield banner was one of the first things I built and it's still one of my favorites. It scans your connected wallet's token balances, checks them against the full vault universe, and shows you which tokens could be earning yield but aren't.
The suggestions show up as a small green badge on the Earn tab. Click it and you get a ranked list: "Your 2.3 ETH on Arbitrum could earn 4.2% in Aave WETH." Each suggestion links directly to the deposit flow with the source token pre-filled.
This feeds into the concierge's "my assets" mode. When a user says "put my portfolio to work," the system already knows what tokens they hold and where.
The concierge: where the AI lives
The AI concierge has two modes. Rules mode (the idle sweep) matches wallet tokens to vaults using deterministic logic. Intent mode takes natural language and turns it into structured vault queries.
Intent mode has three stages: parse, rank, recommend.
Stage 1: intent parsing
The user types something like "top 3 safest USDC vaults on Arbitrum." Gemini 2.5 Flash Lite parses this into a structured intent object:
interface ParsedIntent {
target_symbol: string | null; // "USDC"
target_chain_id: number | null; // 42161
objective: "safest" | "highest" | "balanced";
min_apy_pct: number | null;
max_apy_pct: number | null;
min_tvl_usd: number | null;
my_assets: boolean; // false
routing_mode: "per-asset" | "consolidate";
result_count: number | null; // 3
include_protocols: string[];
exclude_protocols: string[];
}The system prompt ships a compact chain/protocol registry so the LLM can resolve "Arbitrum" to chain ID 42161 without hallucinating. Zod validates the response. If the LLM returns something that doesn't parse, the system retries once. If both attempts fail, the user gets an error — no silent fallbacks to broken state.
Getting the parsing right was the most iterative part of the build. "Pool my assets in the top 3 vaults" needs to set my_assets: true and routing_mode: "consolidate" and result_count: 3. "Best vault for each of my tokens" needs my_assets: true and routing_mode: "per-asset". "My ETH" is tricky — the possessive "my" before a specific token means target_symbol: "ETH" and my_assets: false, not a wallet scan.
I spent at least a full day just refining the disambiguation rules in the system prompt.
Stage 2: vault ranking
Once the intent is parsed, rankVaultsForIntent filters the full vault universe (all transactional vaults across all chains) and sorts by objective.
Symbol matching uses alias groups so "ETH" matches WETH vaults (which is 99% of ETH vaults on L2s):
const SYMBOL_ALIAS_GROUPS: ReadonlyArray<ReadonlySet<string>> = [
new Set(["ETH", "WETH"]),
new Set(["BTC", "WBTC", "CBBTC", "TBTC"]),
new Set(["USDC", "USDC.E", "USDBC"]),
new Set(["MATIC", "WMATIC", "POL"]),
// ...
];The "balanced" objective blends APY and TVL with a 55/45 weight. "Highest" sorts pure APY. "Safest" sorts by TVL — a rough proxy for protocol trust, but a better default than nothing.
For generic discovery queries (no target symbol), there's a protocol diversity cap: no more than 3 vaults from the same protocol. Without this, "top 10 vaults" returns 10 Aave vaults because Aave has the most TVL on every chain.
Stage 3: LLM recommendation
The top-ranked vaults (8 for targeted queries, 12 for discovery) go to Gemini for a final recommendation pass. The LLM sees vault metadata (APY, TVL, protocol, chain, underlying tokens) and produces a structured response: best pick, safest pick, alternatives, and a natural-language rationale for each.
const request = buildGeminiIntentRequest(
intent, candidates, walletAssets,
sourceTokenSymbol, sourceChainId
);The recommendation includes vault_slug references back to the candidate list. Zod validates the response schema. If the LLM returns slugs that don't match any candidate, the system falls back to rules-based ranking.
For per-asset mode, the LLM gets source asset context: "The user holds BNB on chain 56. This is what they will deposit FROM, it will be swapped/bridged into the vault's underlying token automatically." That way the LLM can weigh entry cost against yield. A 5% APY vault that requires a $20 swap from BNB is worse than a 4.8% vault that's native to BSC.
Consolidate vs. per-asset: the two multi-asset modes
When a user says "put my portfolio to work," the system needs to decide: should each token go to its own best vault, or should everything funnel into a few high-quality vaults?
Consolidate mode runs a single global search (no symbol filter) and picks the top N vaults. The LLM explains why each vault was chosen, and the user's various tokens get routed there via the Composer's cross-chain swap+deposit. Per-asset mode is the opposite: find the best vault for each token individually, with a separate LLM recommendation pass per asset that includes source token context so it can reason about entry costs.
The default is consolidate. That's what most people mean when they say "pool my assets." Per-asset only triggers when the user explicitly asks for separate vaults per token.
Both modes search the entire vault universe globally. I learned this the hard way. My initial implementation filtered vaults by the source token's symbol, which meant BNB only matched BNB/WBNB vaults. There is exactly one of those in the Earn catalog. It pays 0.02% APY. Since LI.FI Composer handles cross-chain swaps, the best vault for someone holding BNB might be a USDC vault on Arbitrum. The concierge needs to see all options to make that call.
The execution queue
When the concierge recommends vaults, the user can approve them and the system builds an execution queue. Each leg is a Composer deposit — potentially cross-chain — that walks through the same quote → simulate → approve → execute flow as a manual deposit.
type LegStatus = "pending" | "active" | "done" | "failed";
interface Leg {
vault: EarnVault;
sourceToken: EarnToken;
sourceChain: number;
amount: string;
status: LegStatus;
}The queue is forward-only: you can't go back and retry a failed leg (the token might already be spent). Failed legs show the error and the user can handle them manually. This is intentional — the execution queue is a convenience layer, not a transaction manager.
The Gemini 503 saga
The hardest part of this build wasn't the architecture. It was Gemini returning 503s on a billed plan.
The intent parser and recommendation engine both call Gemini 2.5 Flash Lite. During development I was hitting consistent 503 errors — not rate limits, not quota, just server errors. On a plan I was paying for. No error body, no retry-after header, just a flat 503.
The fix was retry logic with backoff (up to 3 attempts for recommendations, 2 for parsing) and aggressive React Query caching (staleTime: 5 * 60 * 1000) so identical queries don't re-trigger LLM calls. Between the retries and the caching, users rarely see the 503s. But debugging at 2 AM when you can't tell if your prompt is broken or the API is just down? That's a special kind of miserable.
The happy path takes an afternoon. Reliability takes the rest of the hackathon.
The proxy layer
Both Gemini and the LI.FI Composer need API keys that can't live in client-side code. The Vite dev server proxies these endpoints, injecting keys from environment variables:
/api/lifi-earn/*proxies toearn.li.fi(no auth needed, but centralizes the base URL)/api/lifi-composer/*proxies toli.questwith the API key injected/api/gemini-proxy/*proxies togenerativelanguage.googleapis.comwith the Gemini key injected
Secrets stay out of the bundle, client code treats everything as same-origin fetches. Simple in theory. Getting CORS headers right for the Composer's preflight requests took more fiddling than I'd like to admit.
What I'd do differently
Better error boundaries around the LLM. Right now a Gemini failure shows a generic error alert. I'd want per-stage error messages: "Couldn't understand your request" vs. "Found vaults but couldn't rank them" vs. "Ranking succeeded but the recommendation call failed."
Streaming for the thinking indicator. The concierge shows a cycling "Pondering... Analyzing... Searching..." animation while waiting for Gemini. It's cute, but real streaming tokens would feel more responsive and give the user confidence that something is actually happening.
A proper vault safety score. Using TVL as a proxy for "safest" is a starting point, not an answer. Protocol audit status, time since deployment, historical exploit data, insurance coverage. The Earn API doesn't expose this data today, but the ranking architecture can absorb it when it does.
The hackathon framing
This was built for LI.FI's DeFi Mullet Hackathon — $5K USDC prize pool, five tracks: Yield Builder, AI x Earn, DeFi UX Challenge, Developer Tooling, and Open Track. I submitted to most of them because the integration genuinely touches all of those categories.
The judging criteria weight API integration at 35%, which makes sense since the whole point is building on LI.FI's infrastructure. I use the Earn Data API, Composer API, chain/protocol metadata endpoints, and portfolio positions endpoint. The AI concierge covers the innovation angle. Deposit/withdrawal flows with simulation cover product completeness.
Five days is tight. If I'm being honest, days 1-2 were the vault browser, deposit/withdrawal flows, and positions view. Days 3-4 were the entire AI concierge. Day 5 was bug fixes, the risk filtering system, and this article.
What this is really about
The interesting question in DeFi yield isn't "which vault has the highest APY." It's "given what I hold, what I'm willing to risk, and what I actually said I want, what should I do?" That's a reasoning problem, not a sorting problem. And reasoning is where an LLM layer earns its keep.
Type what you want. Get vaults that match. Deposit across chains. That's the whole interaction.
The code is rough in places. Five-day hackathon code always is. But the pipeline works: parse intent, filter universe, rank candidates, recommend with rationale, execute with simulation. The hard part was getting the pipeline right. The rest is cleanup.