Methodology & Metrics
Understanding our comprehensive testing framework and performance evaluation metrics
Testing Methodologies
Forward Testing
Forward testing evaluates the algorithm's ability to predict future exam patterns by training on historical data (e.g., 2020-2023 papers) and testing on newer, unseen papers (e.g., 2024).
Use Case:
Simulates real-world scenario where the model predicts upcoming exam patterns based on past trends.
Backward Testing
Backward testing validates consistency by training on recent data (e.g., 2024) and testing on historical papers (e.g., 2020-2023). This ensures the algorithm captures fundamental patterns that persist over time.
Use Case:
Verifies that the core patterns identified are stable and not just temporary anomalies.
Comparative Testing
Comparative testing benchmarks Maple against leading AI models (GPT, Claude, Gemini, Grok) using identical datasets and evaluation criteria to establish performance superiority.
Use Case:
Demonstrates Maple's competitive advantage in pattern recognition and prediction accuracy.
Performance Metrics Explained
F1 Score
The F1 Score is the harmonic mean of precision and recall, providing a balanced measure of the algorithm's overall accuracy. It ranges from 0 to 1 (or 0% to 100%).
Why it matters: A high F1 score indicates the model is both precise and comprehensive in its predictions.
Precision
Precision measures the accuracy of positive predictions. It answers: "Of all the topics/questions the algorithm predicted would appear, how many actually appeared?"
Why it matters: High precision means students can trust the recommendations without wasting time on irrelevant topics.
Recall
Recall measures the algorithm's ability to identify all relevant items. It answers: "Of all topics/questions that actually appeared, how many did the algorithm correctly predict?"
Why it matters: High recall ensures comprehensive coverage—students won't miss critical topics.
Real-World Example
Understanding metrics through a practical scenario
Imagine an exam has 100 total questions, and Maple predicts 30 topics are likely to appear:
Scenario A: Perfect Precision
All 30 predicted topics appear, but 20 other topics also appear (not predicted).
Precision: 100%
Recall: 60%
F1 Score: 75%
Scenario B: Perfect Recall
Of 30 predictions, only 20 appear, but these 20 are ALL the topics that appeared.
Precision: 67%
Recall: 100%
F1 Score: 80%
Maple's Performance
Achieves balanced high scores across all metrics through advanced pattern recognition.
Precision: ~94%
Recall: ~89%
F1 Score: ~91%
Competing AI Models
Maple was benchmarked against the latest and most advanced AI models from leading providers to ensure fair and comprehensive comparison:
GPT-5.1
OpenAI
Claude Haiku-4.5
Anthropic
Claude Sonnet-4.5
Anthropic
Gemini 2.0-Flash
Gemini 2.5-Pro
Gemini 3-Pro
Flash Latest
Flash-Lite Latest
Grok-4.1
xAI
Note: All models were tested under identical conditions using the same datasets, prompts, and evaluation criteria to ensure fair comparison. Maple consistently outperformed all competing models across all subjects and testing methodologies.
The Maple Advantage
AI-native algorithm built for exam pattern recognition
Cognitive Science Foundation
Maple leverages principles from cognitive psychology and learning science to understand how exam patterns reflect educational priorities and student knowledge assessment.
Mathematical Precision
Advanced mathematical models including graph theory, probabilistic inference, and multi-dimensional feature extraction enable deeper pattern recognition than general-purpose LLMs.
Psychometric Intelligence
Understanding student psychometric profiles allows Maple to personalize predictions based on individual strengths, weaknesses, and learning patterns.
Domain Specialization
Unlike general AI models, Maple is purpose-built for educational pattern recognition, resulting in significantly higher accuracy for exam prediction tasks.