Experiment Results
Experiment ID: 20231026-143000
Charts
Performance Analysis
Training Progress
Model Comparison
Verdict
Hypothesis Supported
Confidence: 89%
The experiment successfully demonstrates that the Transformer architecture achieves superior BLEU scores (29.2) compared to the baseline RNN model (26.8), with statistical significance (p < 0.001).
Statistics
Statistical Analysis
| Metric | Value | Baseline | Improvement | p-value | Effect Size | 95% CI | Significance |
|---|---|---|---|---|---|---|---|
| BLEU Score | 29.2 | 26.8 | +2.4 | < 0.001 | 0.73 | [28.8, 29.6] | High |
| Training Time | 12.3h | 18.7h | -6.4h | < 0.01 | 1.12 | [11.8, 12.8] | High |
| Model Parameters | 65M | 45M | +20M | N/A | N/A | N/A | N/A |
| Inference Speed | 127 tok/s | 89 tok/s | +38 tok/s | < 0.05 | 0.45 | [119, 135] | Medium |
Statistical significance determined using two-tailed t-test with α = 0.05. Effect sizes calculated using Cohen's d.
Metadata
Experiment Metadata
Dataset
Hash:
sha256:a1b2c3d4e5f6...
Reproducibility
Random Seed:42
Experiment ID:exp_20240115_143000
Compute
GPU Hours:98.4
Environment:Python 3.9, PyTorch 1.12
Timeline
Started:1/15/2024, 2:30:00 PM
Completed:1/16/2024, 2:45:00 AM