LLM Profiler

Reset Application

Are you sure you want to reset the application? This will delete all test runs and results. This action cannot be undone.

Active Runs

No active test runs
[[ run.model_display_name ]] [[ run.filename ]] Temperature: [[ typeof run.temperature === 'number' ? run.temperature.toFixed(2) : '...' ]] ([[ run.iterations ]] iterations)
Run ID: [[ run.id ]] [[ run.status || 'initializing' ]] [[ formatDate(run.timestamp) ]] [[ run.completed_tests || 0 ]] / [[ run.total_tests ]] tests completed ([[ run.successful_tests ]] successful) Runtime: [[ formatCompactRuntime(run.runtime_seconds) ]] Error: [[ run.error ]]

Success Rate

[[Math.round(selectedRun.summary.success_rate * 100)]]%

Total Tests

[[selectedRun.summary.total_tests]]

Successful Tests

[[selectedRun.summary.successful_tests]]

Total Runtime

[[formatCompactRuntime(selectedRun.summary.runtime_seconds)]]

Score Average

[[selectedRun.summary.score_average ? selectedRun.summary.score_average.toFixed(2) : 'N/A']]

Test Results Summary

Test Name Category Status Expected Actual
[[result.test_name || 'Unnamed Test']] [[result.category || 'No Category']] [[ result.success ? 'Pass' : 'Fail' ]] [[result.validation.expected]] [[result.validation.expected.join(', ')]] [[JSON.stringify(result.validation.expected)]] No validation data [[result.output || 'No output']]

Score Distribution

Test Cases

[[result.test_name]]

[[result.category]]

[[ result.success ? 'Pass' : 'Fail' ]] [[ parseAgentResponse(result.output).score ]]/[[ result.validation.expected.min_score ]]
Click to view details

Test Details

Input Ticket

[[selectedRun.results[selectedTestIndex].input]]

Response

Score: [[ parseAgentResponse(selectedRun.results[selectedTestIndex].output).score ]] / [[ selectedRun.results[selectedTestIndex].validation.expected.min_score ]]
Steps Taken:
  • [[stepIndex + 1]]. [[step]]
Simple score response - steps evaluation skipped
[[selectedRun.results[selectedTestIndex].output]]

Required Steps Coverage

  • [[step]]

📊 Data Analytics Report

Dataset Overview

Total Tickets
[[selectedRun.data_analytics_report.dataset_overview.total_tickets]]
Unique Customers
[[selectedRun.data_analytics_report.dataset_overview.unique_customers]]
Unique Users
[[selectedRun.data_analytics_report.dataset_overview.unique_users]]
Categories
[[selectedRun.data_analytics_report.dataset_overview.unique_categories]]

⏱️ Processing Time Analysis

Average
[[formatTime(selectedRun.data_analytics_report.processing_time_analysis.resolution_time_minutes.statistics?.mean)]]
Median
[[formatTime(selectedRun.data_analytics_report.processing_time_analysis.resolution_time_minutes.statistics?.p50)]]
Fastest
[[formatTime(selectedRun.data_analytics_report.processing_time_analysis.resolution_time_minutes.fastest_resolution)]]
Longest
[[formatTime(selectedRun.data_analytics_report.processing_time_analysis.resolution_time_minutes.longest_resolution)]]
Quick Resolution Performance
Under 1 hour: [[selectedRun.data_analytics_report.processing_time_analysis.resolution_time_minutes.tickets_resolved_under_1h]] tickets
Under 4 hours: [[selectedRun.data_analytics_report.processing_time_analysis.resolution_time_minutes.tickets_resolved_under_4h]] tickets
Under 1 day: [[selectedRun.data_analytics_report.processing_time_analysis.resolution_time_minutes.tickets_resolved_under_1d]] tickets
Over 1 week: [[selectedRun.data_analytics_report.processing_time_analysis.resolution_time_minutes.tickets_over_1_week]] tickets

🏷️ Category Analysis

Top Categories
[[category]]
[[count]] tickets ([[selectedRun.data_analytics_report.category_analysis.category_weights[category]]]%)

📈 Distribution Analysis

Resolution Time Distribution
25th Percentile
[[formatTime(selectedRun.data_analytics_report.bell_curve_analysis.resolution_time.percentiles?.p25)]]
50th Percentile
[[formatTime(selectedRun.data_analytics_report.bell_curve_analysis.resolution_time.percentiles?.p50)]]
75th Percentile
[[formatTime(selectedRun.data_analytics_report.bell_curve_analysis.resolution_time.percentiles?.p75)]]
90th Percentile
[[formatTime(selectedRun.data_analytics_report.bell_curve_analysis.resolution_time.percentiles?.p90)]]
Ticket Distribution by Quartiles
Q1: [[selectedRun.data_analytics_report.bell_curve_analysis.resolution_time.quartile_analysis.q1_count]]
Q2: [[selectedRun.data_analytics_report.bell_curve_analysis.resolution_time.quartile_analysis.q2_count]]
Q3: [[selectedRun.data_analytics_report.bell_curve_analysis.resolution_time.quartile_analysis.q3_count]]
Q4: [[selectedRun.data_analytics_report.bell_curve_analysis.resolution_time.quartile_analysis.q4_count]]

🔍 Topic Clustering

[[topicName]]
[[topic.percentage]]%
[[topic.count]] tickets
Key terms: [[topic.top_terms?.join(', ')]]

💡 Key Insights

  • [[insight]]

Select Models

[[ model.model_info.display_name ]]

Size: [[ model.model_info.model_metadata.size ]]

Quantization: [[ model.model_info.model_metadata.quantization ]]

Test Configuration

Standard mode for testing exact matches or content validation

Agent mode for evaluating support ticket resolution workflows and scoring

Adjust response randomness (0 = deterministic, 1 = creative)

Choose JSON test suite files... [[ testSuiteFiles.length ]] file(s) selected

Upload one or multiple JSON test suite files

[[ file.name ]]

[[ file.testCount ]] test(s)

System prompt: [[ file.systemPrompt.length > 100 ? file.systemPrompt.substring(0, 100) + '...' : file.systemPrompt ]]

Each test will be repeated this many times

No test cases yet. Click "Add Test Case" to create one.

Test Case #[[ index + 1 ]]

Adjust response randomness (0 = deterministic, 1 = creative)