@page "/doc/afh" @inject IJSRuntime JsRuntime @inject NavigationManager NavigationManager @* * dbMango * * Copyright 2025 Deutsche Bank AG * SPDX-License-Identifier: Apache-2.0 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. *@

Aggregation for Humans documentation

The MongoDB Aggregation Framework is a powerful tool for processing and transforming data within MongoDB. It allows you to perform complex operations on collections of documents, similar to SQL queries but with a more flexible and expressive syntax. The framework operates on the concept of a "pipeline," where data flows through a series of stages, each stage performing a specific transformation.

Here's a breakdown of key concepts and common stages:

Key Concepts:

Pipeline: A sequence of data processing stages. Each stage takes the output of the previous stage as its input and applies a transformation.
Stages: Individual operations within the pipeline, such as filtering, grouping, projecting, sorting, and more.
Documents: The basic unit of data in MongoDB, represented as JSON-like objects.
Fields: Key-value pairs within a document.
Expressions: Used within stages to compute values, access fields, and perform operations on data.

More information can be found on MongoDB site.

Reasons why MongoDB Aggregation JSON is not ideal for human writing

Verbosity and Nesting: Aggregation pipelines often involve deeply nested JSON structures, making them lengthy and difficult to parse.
Operator-Centric Approach: The syntax relies heavily on operators (e.g., $match, $group, $avg), which can be less intuitive than declarative SQL.
Error-Prone: The strict JSON syntax means minor errors (missing commas, incorrect brackets) can lead to invalid queries that are hard to debug.
Lack of Readability: JSON isn't naturally conducive to human understanding of complex logic; the data flow can be obscured.
Difficult to Visualize: It's hard to mentally visualize data transformations at each stage from raw JSON.
Repetitive Patterns: Similar structures might be repeated, making queries tedious to write and maintain.
Limited Code Reuse: No built-in mechanism for easily reusing pipeline parts or defining functions within the query.

While powerful, MongoDB aggregation's JSON syntax can be challenging for humans due to its verbosity and operator-centric nature. The provided grammar (MongoAggregationForHumans) aims to address these issues with a more human-friendly syntax.

Here is an example of the pretty complex AFH pipeline and its equivalent using MongoDB aggregation Json. We hope you'll understand why we created AFH :)

MongoAggregationForHumans Language Syntax

The MongoAggregationForHumans grammar defines a language for expressing MongoDB aggregation pipelines in a more human-readable format. It aims to simplify the creation and understanding of these pipelines compared to the standard JSON syntax.

Overall Structure

A program in this language consists of a single statement that defines an aggregation pipeline:

            
                file
                : 'FROM' STRING pipeline_def
                ;

This indicates that a pipeline operates on a collection specified by STRING (the collection name) and is defined by a pipeline_def.

Pipeline Definition

A pipeline is a sequence of stages enclosed in curly braces:

            
                pipeline_def
                : 'PIPELINE' '{'  stages_list '}'
                ;

Stages

The stages_list allows for one or more stage_def, which represent the individual operations in the pipeline:

            
                stages_list
                : stage_def
                | stages_list stage_def
                ;

The grammar supports the following stage types:

            
                stage_def
                : match_def       // WHERE clause for filtering
                | addfields_def   // ADD new fields
                | project_def     // PROJECT fields (include or exclude)
                | group_by_def    // GROUP BY a key and calculate aggregates
                | sort_def        // SORT BY specified fields
                | join_def        // JOIN with another collection
                | unwind_def      // UNWIND an array field
                | replace_def     // REPLACE the root document
                | do_def          // DO (likely for custom operations or embedding JSON)
                ;

Stage Details

Each stage has its own syntax. Here are some examples:

match_def (WHERE):

                    
                        match_def: 'WHERE' expression ('OPTIONS' json)?
                        ;

Filters documents based on an expression. An optional OPTIONS clause allows for specifying JSON options.

addfields_def (ADD):

                    
                        addfields_def
                        : 'ADD' let_list ('OPTIONS' json)?
                        ;

Adds new fields defined in a let_list. Optional OPTIONS.

project_def (PROJECT):

                    
                        project_def
                        : 'PROJECT' ('ID' '{' id_list=let_list '}')? data_list=let_list ('OPTIONS' json)?     # ProjectInclude
                        | 'PROJECT' 'EXCLUDE' var_list ('OPTIONS' json)?                    # ProjectExclude
                        ;

Includes or excludes fields. ProjectInclude allows specifying an ID and a data_list of fields to include. ProjectExclude uses a var_list to specify fields to exclude. Optional OPTIONS.

group_by_def (GROUP BY):

                    
                        group_by_def: 'GROUP' 'BY' id_list=let_list ('LET' data_list=let_list)? ('OPTIONS' json)?
                        ;

Groups documents by fields specified in id_list and calculates aggregates defined in the optional data_list (using LET). Optional OPTIONS.

sort_def (SORT BY):

                    
                        sort_def: 'SORT' 'BY' sort_var_list ('OPTIONS' json)?
                        ;

Sorts documents based on fields in sort_var_list, which can include ASC or DESC order. Optional OPTIONS.

join_def (JOIN):
```
                    
                        join_def: 'JOIN' STRING 'AS' (VARIABLE | STRING) 'ON' equivalence_list ('LET' let_list )? ('PIPELINE' '{'  stages_list '}')? ('OPTIONS' json)?
                        ;
                    
                
```
Performs a join with another collection (specified by STRING). The joined collection is aliased using AS (either a VARIABLE or STRING). The join condition is defined by equivalence_list. An optional LET clause allows defining new fields based on the joined data. A sub-pipeline can be applied to the joined collection. Optional OPTIONS.

unwind_def (UNWIND):

                    
                        unwind_def: 'UNWIND' VARIABLE ('INDEX' VARIABLE)? ('OPTIONS' json)?
                        ;

Unwinds an array field (specified by VARIABLE). An optional INDEX clause specifies a variable to store the index of the array element. Optional OPTIONS.

replace_def (REPLACE):

                    
                        replace_def
                        : 'REPLACE' 'ID' '{' id_list=let_list '}' data_list=let_list ('OPTIONS' json)?
                        ;

Replaces the root document. It uses id_list and data_list to define the replacement document. Optional OPTIONS.

do_def (DO):

                    
                        do_def: 'DO' json
                        ;

This stage seems to allow embedding arbitrary JSON (json) within the pipeline. Its exact behavior would depend on the implementation.

Expressions

The language uses a hierarchy of expressions to define conditions, calculations, and field manipulations. The grammar includes rules for:

expression: Combines comparison_expression with logical operators (AND, OR).
comparison_expression: Combines additive_expression with comparison operators (==, !=, >, >=, <, <=).
additive_expression: Combines multiplicative_expression with addition and subtraction (+, -).
multiplicative_expression: Combines unary_expression with multiplication and division (*, /).
unary_expression: Handles unary operators (+, -, NOT) and various primary expressions.
brackets_expression: Includes atoms, function calls, "IN" expressions, "IS" (projection) expressions, "EXISTS" expressions, and bracketed expressions.
atom: Represents basic values like strings, numbers, booleans, null, and variables.

The grammar also defines rules for let_list (for defining fields and expressions), var_list (for lists of variables), sort_var_list (for sorting specifications), and equivalence_list (for join conditions).

JSON Integration

The language integrates with JSON for specifying options and potentially within the do_def stage. The JsonGrammar.g4 file (also provided) defines the JSON syntax used.

Key Features and Improvements

Compared to standard MongoDB JSON, this language offers:

More Readable Keywords: Uses keywords like FROM, PIPELINE, WHERE, ADD, PROJECT, GROUP BY, SORT BY, JOIN, UNWIND, REPLACE, making the structure clearer.
Simplified Syntax: Aims to reduce nesting and verbosity, especially for common operations.
More Natural Expression Syntax: Uses familiar operators (==, >, <, AND, OR) and function call syntax.

Overall, MongoAggregationForHumans provides a more user-friendly way to express MongoDB aggregation pipelines, potentially reducing errors and improving developer productivity.

expression

comparizon_expression

additive_expression

multiplicative_expression

unary_expression

brackets_expression

atom

named_args_list

unnamed_args_list

expression_array

expression_array_item

Variables

Strings

Operators


                AND: 'AND' | '&&';
                OR: 'OR' | '||';
                NOT: 'NOT' | '!';
                EQ: '==';
                NEQ: '<>' | '!=';
                GT: '>';
                GTE: '>=';
                LT: '<';
                LTE: '>=';
                ASC: 'ASC';
                DESC: 'DESC';
                MUL: '*';
                DIV: '/';
                PLUS: '+';
                MINUS: '-';

pipeline_def

@code { private string StageWhere = @" WHERE ( cob == ""2025-04-22"" && Department == ""Department Name"" ) && Book NOT IN (""Book1"", ""Book2"") "; private string StageAdd = @" ADD // special syntax pv + premiumPV AS TotalPV, abs( pv + pvMove ) AS MyPnl, // function call dateToString( format: ""%Y-%m-%d"", date: field1 ) AS ""TodayStr"", // named args { ""data"" AS Key, ""value"" AS Value } AS Nested, // nested object [{ ""data1"" AS Key, ""v1"" AS Value }, { ""data2"" AS Key, ""v2"" AS Value }] AS NestedArray // nested array "; private string StageGroupBy = @" GROUP BY CurveKey LET max( Order ) AS Order, sum( Value ) AS Value "; private string StageBucket = @" BUCKET Field1 / 100.1 BOUNDARIES 1, 10, 100, 1000 DEFAULT ""Ignored"" LET Field1 / 100.1 AS Gain "; private string StageJoin = @" JOIN ""PnL-Market"" AS Market ON $_id.CurveKey == _id "; private string StageProject = @" PROJECT HedgeLVSVInstruments, CurvePrefix, objectToArray( VegaDetails ) AS VegaDetails "; private string StageReplace = @" REPLACE ID { $_id.CcyPair AS CcyPair, $_id.Tenor AS Tenor } Order, 'DN OpeningCurve', '10RR OpeningCurve' "; private string StageSort = @" SORT BY $_id.CcyPair, Order "; private string StageUnwind = @" UNWIND Data INDEX Order "; private string StageDo = @" DO { ""$replaceRoot"": { ""newRoot"": ""$_id"" } } "; private string StageFacet = @" FACET categorizedByTags PIPELINE { UNWIND tags DO { ""$sortByCount"": ""$tags"" } }, categorizedByPrice PIPELINE { WHERE price == exists( 1 ) BUCKET price BOUNDARIES 0, 150, 200, 300, 400 DEFAULT ""Other"" LET sum( 1 ) AS count, push( title ) AS titles }, 'categorizedByYears(Auto)' PIPELINE { BUCKET AUTO year BUCKETS 4 } "; private string StageComplexExample = @" WHERE COB == date( ""2025-05-15T00:00:00Z"" ) AND (Department == ""Department Name"" ) WHERE $VegaDetails.VegaImpact == exists( true ) PROJECT HedgeLVSVInstruments, $VegaDetails.OpeningVega, $VegaDetails.ClosingVega, $VegaDetails.VegaImpact, $VegaDetails.Vega2ndOrderImpact, concat( dateToString( format: ""%Y%m%d"", date: COB ), ""-"", toString( OpeningVolRateSetId ), ""-"", toString( ClosingVolRateSetId ), ""-PnL-Vol-"" ) AS CurvePrefix PROJECT HedgeLVSVInstruments, CurvePrefix, objectToArray( VegaDetails ) AS VegaDetails UNWIND VegaDetails PROJECT HedgeLVSVInstruments, CurvePrefix, $VegaDetails.k AS Type, objectToArray( $VegaDetails.v ) AS Data UNWIND Data WHERE (Type == ""OpeningVega"" OR Type == ""VegaImpact"" OR Type == ""Vega2ndOrderImpact"") PROJECT concat( CurvePrefix, $Data.v.CcyPair, ""-"", HedgeLVSVInstruments, ""-"" ) AS CurvePrefix, Type, $Data.v.CcyPair AS CcyPair, objectToArray( $Data.v.Data ) AS Data UNWIND Data INDEX Order PROJECT CurvePrefix, CcyPair, $Data.k AS Tenor, Order, Type, objectToArray( $Data.v ) AS Data UNWIND Data PROJECT CurvePrefix, concat( CurvePrefix, Tenor, ""-"", $Data.k ) AS CurveKey, CcyPair, Type, Tenor, Order, $Data.k AS Delta, $Data.v AS Value WHERE (Delta == ""10RR"" OR Delta == ""10FLY"" OR Delta == ""25RR"" OR Delta == ""25FLY"" OR Delta == ""10C"" OR Delta == ""10P"" OR Delta == ""25C"" OR Delta == ""25P"" OR Delta == ""DN"") GROUP BY CurvePrefix, CcyPair, Type, Tenor, Delta, CurveKey LET max( Order ) AS Order, sum( Value ) AS Value JOIN ""PnL-Market"" AS Market ON $_id.CurveKey == _id UNWIND Market ADD Order AS ""_id.Order"", [ { ""OpeningCurve"" AS ""k"", $Market.Opening AS ""v"" }, { ""CurveMove"" AS ""k"", $Market.Move AS ""v"" }, { ""Value"" AS ""k"", Value AS ""v"" } ] AS ""_id.Data"" DO { ""$replaceRoot"": { ""newRoot"": ""$_id"" } } UNWIND Data PROJECT CurvePrefix, CcyPair, Tenor, Order, Delta, cond( if: $Data.k == ""OpeningCurve"" OR $Data.k == ""CurveMove"", then: $Data.k, else: Type ) AS Type, $Data.v AS Value GROUP BY CurvePrefix, CcyPair, Tenor LET max( Order ) AS Order, addToSet( Name: concat( Delta, "" "", Type ), Value: Value ) AS Items PROJECT _id, Order, arrayToObject( zip( inputs: [ $Items.Name, $Items.Value ] ) ) AS tmp ADD _id AS ""tmp._id"", Order AS ""tmp.Order"" DO { ""$replaceRoot"": { ""newRoot"": ""$tmp"" } } WHERE 'DN OpeningVega' != NULL AND 'DN OpeningVega' != NULL REPLACE ID { $_id.CcyPair AS CcyPair, $_id.Tenor AS Tenor } Order, 'DN OpeningCurve', '10RR OpeningCurve', '25RR OpeningCurve', '10FLY OpeningCurve', '25FLY OpeningCurve', '25P OpeningCurve', '10P OpeningCurve', '10C OpeningCurve', '25C OpeningCurve', 'DN CurveMove', '10RR CurveMove', '25RR CurveMove', '10FLY CurveMove', '25FLY CurveMove', '25P CurveMove', '10P CurveMove', '10C CurveMove', '25C CurveMove', 'DN OpeningVega', '10RR OpeningVega', '25RR OpeningVega', '10FLY OpeningVega', '25FLY OpeningVega', '25P OpeningVega', '10P OpeningVega', '10C OpeningVega', '25C OpeningVega', 'DN VegaImpact', '10RR VegaImpact', '25RR VegaImpact', '10FLY VegaImpact', '25FLY VegaImpact', '25P VegaImpact', '10P VegaImpact', '10C VegaImpact', '25C VegaImpact', 'DN Vega2ndOrderImpact', '10RR Vega2ndOrderImpact', '25RR Vega2ndOrderImpact', '10FLY Vega2ndOrderImpact', '25FLY Vega2ndOrderImpact', '25P Vega2ndOrderImpact', '10P Vega2ndOrderImpact', '10C Vega2ndOrderImpact', '25C Vega2ndOrderImpact' SORT BY $_id.CcyPair, Order PROJECT EXCLUDE Order "; protected override Task OnAfterRenderAsync(bool firstRender) { if (firstRender) { // Sync the URL with the current tab page NavigationManager.GetQueryParameters().TryGetValue("tab", out var tabPage); ActivePage = tabPage ?? "Overview"; SyncUrl(); StateHasChanged(); } return Task.CompletedTask; } private void SyncUrl() { var url = NavigationManager.BaseUri + "doc/afh"; if (!string.IsNullOrWhiteSpace(ActivePage)) url += $"?tab={Uri.EscapeDataString(ActivePage)}"; JsRuntime.InvokeAsync("DashboardUtils.ChangeUrl", url); } private string ActivePage { get; set { if (field == value) return; field = value; SyncUrl(); } } = "Overview"; }