Back to Benchmark Report

Methodology & Metrics

Understanding our comprehensive testing framework and performance evaluation metrics

Testing Methodologies

Forward Testing

Forward testing evaluates the algorithm's ability to predict future exam patterns by training on historical data (e.g., 2020-2023 papers) and testing on newer, unseen papers (e.g., 2024).

Use Case:

Simulates real-world scenario where the model predicts upcoming exam patterns based on past trends.

Backward Testing

Backward testing validates consistency by training on recent data (e.g., 2024) and testing on historical papers (e.g., 2020-2023). This ensures the algorithm captures fundamental patterns that persist over time.

Use Case:

Verifies that the core patterns identified are stable and not just temporary anomalies.

Comparative Testing

Comparative testing benchmarks Maple against leading AI models (GPT, Claude, Gemini, Grok) using identical datasets and evaluation criteria to establish performance superiority.

Use Case:

Demonstrates Maple's competitive advantage in pattern recognition and prediction accuracy.

Performance Metrics Explained

F1 Score

The F1 Score is the harmonic mean of precision and recall, providing a balanced measure of the algorithm's overall accuracy. It ranges from 0 to 1 (or 0% to 100%).

F1 = 2 × (Precision × Recall) / (Precision + Recall)

Why it matters: A high F1 score indicates the model is both precise and comprehensive in its predictions.

Precision

Precision measures the accuracy of positive predictions. It answers: "Of all the topics/questions the algorithm predicted would appear, how many actually appeared?"

Precision = True Positives / (True Positives + False Positives)

Why it matters: High precision means students can trust the recommendations without wasting time on irrelevant topics.

Recall

Recall measures the algorithm's ability to identify all relevant items. It answers: "Of all topics/questions that actually appeared, how many did the algorithm correctly predict?"

Recall = True Positives / (True Positives + False Negatives)

Why it matters: High recall ensures comprehensive coverage—students won't miss critical topics.

Real-World Example

Understanding metrics through a practical scenario

Imagine an exam has 100 total questions, and Maple predicts 30 topics are likely to appear:

Scenario A: Perfect Precision

All 30 predicted topics appear, but 20 other topics also appear (not predicted).

Precision: 100%

Recall: 60%

F1 Score: 75%

Scenario B: Perfect Recall

Of 30 predictions, only 20 appear, but these 20 are ALL the topics that appeared.

Precision: 67%

Recall: 100%

F1 Score: 80%

Maple's Performance

Achieves balanced high scores across all metrics through advanced pattern recognition.

Precision: ~94%

Recall: ~89%

F1 Score: ~91%

Competing AI Models

Maple was benchmarked against the latest and most advanced AI models from leading providers to ensure fair and comprehensive comparison:

GPT-5.1

OpenAI

Claude Haiku-4.5

Anthropic

Claude Sonnet-4.5

Anthropic

Gemini 2.0-Flash

Google

Gemini 2.5-Pro

Google

Gemini 3-Pro

Google

Flash Latest

Google

Flash-Lite Latest

Google

Grok-4.1

xAI

Note: All models were tested under identical conditions using the same datasets, prompts, and evaluation criteria to ensure fair comparison. Maple consistently outperformed all competing models across all subjects and testing methodologies.

The Maple Advantage

AI-native algorithm built for exam pattern recognition

Cognitive Science Foundation

Maple leverages principles from cognitive psychology and learning science to understand how exam patterns reflect educational priorities and student knowledge assessment.

Mathematical Precision

Advanced mathematical models including graph theory, probabilistic inference, and multi-dimensional feature extraction enable deeper pattern recognition than general-purpose LLMs.

Psychometric Intelligence

Understanding student psychometric profiles allows Maple to personalize predictions based on individual strengths, weaknesses, and learning patterns.

Domain Specialization

Unlike general AI models, Maple is purpose-built for educational pattern recognition, resulting in significantly higher accuracy for exam prediction tasks.