About Middleware Pattern¶
Why Middleware Pattern is NOT Used¶
This document explains the architectural decision to not implement the Middleware pattern in EncodingRepair, despite its popularity in modern PHP frameworks.
What is Middleware Pattern?¶
The Middleware pattern allows chaining transformations where each middleware can:
interface TransformationMiddleware
{
public function process(string $data, callable $next): string;
}
// Example usage
$result = $middleware1->process($data, function($data) use ($middleware2) {
return $middleware2->process($data, function($data) {
return $data; // Final result
});
});
Why It's Not Pertinent Here¶
1. Chain of Responsibility Already Implemented¶
EncodingRepair uses Chain of Responsibility pattern for Transcoders and Detectors, which is conceptually similar but better suited for fallback strategies:
// Current architecture (optimal)
foreach ($transcoders as $transcoder) {
$result = $transcoder->transcode($data, $to, $from, $options);
if (null !== $result) {
return $result; // First success wins
}
}
// Middleware would be confusing
$result = $middleware->process($data, $next); // Who does what?
Key difference:
- Chain of Responsibility: Try until one succeeds (fallback)
- Middleware: All execute in sequence (transformation pipeline)
2. Single Responsibility Principle¶
Current architecture maintains clear separation:
- Transcoder: Encoding conversion only
- Detector: Encoding detection only
- CharsetProcessor: Orchestration only
Adding middleware would blur these boundaries:
// Bad: Mixed responsibilities
class NormalizationMiddleware implements TransformationMiddleware
{
public function process(string $data, callable $next): string
{
$normalized = Normalizer::normalize($data); // Transformation
return $next($normalized); // Delegation
}
}
3. Options Array Already Handles Transformations¶
The $options parameter already provides flexible configuration without architectural complexity:
$result = CharsetHelper::toUtf8($data, 'ISO-8859-1', [
'normalize' => true, // Unicode normalization
'translit' => true, // Transliteration
'ignore' => true, // Ignore invalid sequences
]);
4. Performance Impact¶
Middleware adds overhead through nested closures:
// Current: Direct call (45ms/10k ops)
$result = $transcoder->transcode($data, $to, $from, $options);
// Middleware: Closure overhead (~15-20% slower)
$result = $middleware->process($data, fn($d) => $next->process($d, ...));
5. Complexity Without Benefit¶
Middleware would require:
- New interface (
TransformationMiddleware) - Middleware stack management
- Closure-based delegation
- Additional testing complexity
For what gain? Transformations are already configurable via options.
The Better Alternative: Extend Options¶
If you need additional transformations, extend the $options array:
Current Options¶
// In CharsetProcessor::configureOptions()
private function configureOptions(array $options, array ...$replacements): array
{
return array_replace(
[
'normalize' => true, // Unicode NFC normalization
'translit' => true, // Transliterate unmappable chars
'ignore' => true, // Skip invalid sequences
'encodings' => [...], // Detection candidate list
],
...$replacements,
$options
);
}
Adding New Options¶
To add new transformations (e.g., sanitization, validation):
Step 1: Add Option to Defaults¶
private function configureOptions(array $options, array ...$replacements): array
{
return array_replace(
[
'normalize' => true,
'translit' => true,
'ignore' => true,
'sanitize' => false, // NEW: Strip control characters
'validate' => false, // NEW: Throw on invalid encoding
'encodings' => self::DEFAULT_ENCODINGS,
],
...$replacements,
$options
);
}
Step 2: Implement in Conversion Logic¶
private function convertString(string $data, string $to, string $from, array $options): string
{
$result = $this->transcodeString($data, $to, $from, $options) ?? $data;
// Apply sanitization if enabled
if (true === ($options['sanitize'] ?? false)) {
$result = $this->sanitize($result);
}
// Validate if enabled
if (true === ($options['validate'] ?? false)) {
$this->validate($result, $to);
}
return $result;
}
private function sanitize(string $data): string
{
// Remove control characters except \n, \r, \t
return preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/u', '', $data);
}
private function validate(string $data, string $encoding): void
{
if (!mb_check_encoding($data, $encoding)) {
throw new InvalidArgumentException(
sprintf('Invalid %s encoding after conversion', $encoding)
);
}
}
Step 3: Use the New Option¶
// Sanitize control characters
$clean = CharsetHelper::toUtf8($data, 'ISO-8859-1', [
'sanitize' => true,
]);
// Strict validation
$validated = CharsetHelper::toUtf8($data, 'ISO-8859-1', [
'validate' => true,
]);
Comparison Table¶
| Aspect | Middleware Pattern | Options Array (Current) |
|---|---|---|
| Complexity | High (interfaces, stack, closures) | Low (array merge) |
| Performance | Slower (closure overhead) | Fast (direct calls) |
| Extensibility | New middleware classes | New array keys |
| Testability | Complex (mock stack) | Simple (pass options) |
| Readability | Obscure (nested closures) | Clear (explicit options) |
| SOLID | Violates SRP | Maintains SRP |
| Fit | Pipeline transformations | Fallback strategies ✅ |
Conclusion¶
The Middleware pattern is excellent for HTTP request/response pipelines (PSR-15) but not suitable for encoding conversion where:
- Fallback strategies are needed (Chain of Responsibility)
- Single responsibility must be maintained
- Performance is critical (10k+ ops/sec)
- Simplicity is valued (zero dependencies)
The options array approach provides all the flexibility of middleware without the architectural complexity.
Further Reading¶
- Chain of Responsibility Pattern
- PSR-15: HTTP Server Request Handlers (where middleware shines)
- SOLID Principles
- CharsetProcessor Source Code