@page "/doc/afh" @inject IJSRuntime JsRuntime @inject NavigationManager NavigationManager @* * dbMango * * Copyright 2025 Deutsche Bank AG * SPDX-License-Identifier: Apache-2.0 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. *@
The MongoDB Aggregation Framework is a powerful tool for processing and transforming data within MongoDB. It allows you to perform complex operations on collections of documents, similar to SQL queries but with a more flexible and expressive syntax. The framework operates on the concept of a "pipeline," where data flows through a series of stages, each stage performing a specific transformation.
Here's a breakdown of key concepts and common stages:
Key Concepts:More information can be found on MongoDB site.
Reasons why MongoDB Aggregation JSON is not ideal for human writing
$match, $group, $avg), which can be less intuitive than declarative SQL.While powerful, MongoDB aggregation's JSON syntax can be challenging for humans due to its verbosity and operator-centric nature. The provided grammar (MongoAggregationForHumans) aims to address these issues with a more human-friendly syntax.
Here is an example of the pretty complex AFH pipeline and its equivalent using MongoDB aggregation Json. We hope you'll understand why we created AFH :)
The MongoAggregationForHumans grammar defines a language for expressing MongoDB aggregation pipelines in a more human-readable format. It aims to simplify the creation and understanding of these pipelines compared to the standard JSON syntax.
A program in this language consists of a single statement that defines an aggregation pipeline:
file
: 'FROM' STRING pipeline_def
;
This indicates that a pipeline operates on a collection specified by STRING (the collection name) and is defined by a pipeline_def.
A pipeline is a sequence of stages enclosed in curly braces:
pipeline_def
: 'PIPELINE' '{' stages_list '}'
;
The stages_list allows for one or more stage_def, which represent the individual operations in the pipeline:
stages_list
: stage_def
| stages_list stage_def
;
The grammar supports the following stage types:
stage_def
: match_def // WHERE clause for filtering
| addfields_def // ADD new fields
| project_def // PROJECT fields (include or exclude)
| group_by_def // GROUP BY a key and calculate aggregates
| sort_def // SORT BY specified fields
| join_def // JOIN with another collection
| unwind_def // UNWIND an array field
| replace_def // REPLACE the root document
| do_def // DO (likely for custom operations or embedding JSON)
;
Each stage has its own syntax. Here are some examples:
match_def (WHERE):
match_def: 'WHERE' expression ('OPTIONS' json)?
;
Filters documents based on an expression. An optional OPTIONS clause allows for specifying JSON options.
addfields_def (ADD):
addfields_def
: 'ADD' let_list ('OPTIONS' json)?
;
Adds new fields defined in a let_list. Optional OPTIONS.
project_def (PROJECT):
project_def
: 'PROJECT' ('ID' '{' id_list=let_list '}')? data_list=let_list ('OPTIONS' json)? # ProjectInclude
| 'PROJECT' 'EXCLUDE' var_list ('OPTIONS' json)? # ProjectExclude
;
Includes or excludes fields. ProjectInclude allows specifying an ID and a data_list of fields to include. ProjectExclude uses a var_list to specify fields to exclude. Optional OPTIONS.
group_by_def (GROUP BY):
group_by_def: 'GROUP' 'BY' id_list=let_list ('LET' data_list=let_list)? ('OPTIONS' json)?
;
Groups documents by fields specified in id_list and calculates aggregates defined in the optional data_list (using LET). Optional OPTIONS.
sort_def (SORT BY):
sort_def: 'SORT' 'BY' sort_var_list ('OPTIONS' json)?
;
Sorts documents based on fields in sort_var_list, which can include ASC or DESC order. Optional OPTIONS.
join_def (JOIN):
join_def: 'JOIN' STRING 'AS' (VARIABLE | STRING) 'ON' equivalence_list ('LET' let_list )? ('PIPELINE' '{' stages_list '}')? ('OPTIONS' json)?
;
Performs a join with another collection (specified by STRING). The joined collection is aliased using AS (either a VARIABLE or STRING). The join condition is defined by equivalence_list. An optional LET clause allows defining new fields based on the joined data. A sub-pipeline can be applied to the joined collection. Optional OPTIONS.
unwind_def (UNWIND):
unwind_def: 'UNWIND' VARIABLE ('INDEX' VARIABLE)? ('OPTIONS' json)?
;
Unwinds an array field (specified by VARIABLE). An optional INDEX clause specifies a variable to store the index of the array element. Optional OPTIONS.
replace_def (REPLACE):
replace_def
: 'REPLACE' 'ID' '{' id_list=let_list '}' data_list=let_list ('OPTIONS' json)?
;
Replaces the root document. It uses id_list and data_list to define the replacement document. Optional OPTIONS.
do_def (DO):
do_def: 'DO' json
;
This stage seems to allow embedding arbitrary JSON (json) within the pipeline. Its exact behavior would depend on the implementation.
The language uses a hierarchy of expressions to define conditions, calculations, and field manipulations. The grammar includes rules for:
expression: Combines comparison_expression with logical operators (AND, OR).comparison_expression: Combines additive_expression with comparison operators (==, !=, >, >=, <, <=).additive_expression: Combines multiplicative_expression with addition and subtraction (+, -).multiplicative_expression: Combines unary_expression with multiplication and division (*, /).unary_expression: Handles unary operators (+, -, NOT) and various primary expressions.brackets_expression: Includes atoms, function calls, "IN" expressions, "IS" (projection) expressions, "EXISTS" expressions, and bracketed expressions.atom: Represents basic values like strings, numbers, booleans, null, and variables.The grammar also defines rules for let_list (for defining fields and expressions), var_list (for lists of variables), sort_var_list (for sorting specifications), and equivalence_list (for join conditions).
The language integrates with JSON for specifying options and potentially within the do_def stage. The JsonGrammar.g4 file (also provided) defines the JSON syntax used.
Compared to standard MongoDB JSON, this language offers:
FROM, PIPELINE, WHERE, ADD, PROJECT, GROUP BY, SORT BY, JOIN, UNWIND, REPLACE, making the structure clearer.==, >, <, AND, OR) and function call syntax.Overall, MongoAggregationForHumans provides a more user-friendly way to express MongoDB aggregation pipelines, potentially reducing errors and improving developer productivity.
AND: 'AND' | '&&';
OR: 'OR' | '||';
NOT: 'NOT' | '!';
EQ: '==';
NEQ: '<>' | '!=';
GT: '>';
GTE: '>=';
LT: '<';
LTE: '>=';
ASC: 'ASC';
DESC: 'DESC';
MUL: '*';
DIV: '/';
PLUS: '+';
MINUS: '-';