View on GitHub

Shoal

Group by column name an entity selection or a collection of objects

Shoal

language language-top code-size release license discord

Split your data into separate groups to perform computations such as sum, min, max, count, etc…. for better analysis.

Usage

Create the data frame object from a collection of object or an entity selection:

$collection:=New collection(\
New object("letter"; "A"; "value"; 1); New object("letter"; "B"; "value"; 2); New object("letter"; "C"; "value"; 3); \
New object("letter"; "A"; "value"; 4); New object("letter"; "B"; "value"; 5); New object("letter"; "C"; "value"; 6))

$dataFrame:=shoal.frame($collection)

Then group by choosing a column and apply aggregate functions on columns/fields:

$result:=$dataFrame.groupBy("letter").agg(ƒ.sum("value"); ƒ.max("value").as("maxValue"))
{
 "A": {"value":5, "maxValue":4},
 "B": {"value":7, "maxValue":5},
 "C": {"value":9, "maxValue":6}
}

with ƒ the functions builder that you can instanciate one time.

ƒ:=shoal.functions()

Flatten result ie. return a new data frame

You could get result as new data frame by calling flatten

$result:=$dataFrame.groupBy("letter").flatten().agg(ƒ.sum("value"); ƒ.max("value").as("maxValue"))

Useful to preserve column name and value type used by groupBy operation and to apply new operations

{
    "data": [
        { "letter": "A", "value":5, "maxValue":4},
        { "letter": "B", "value":7, "maxValue":5},
        { "letter": "C", "value":9, "maxValue":6}
    ]
}

List of functions

Name Description
ƒ.sum the sum
ƒ.sumDistinct the sum of distinct element
ƒ.min the minimum
ƒ.max the maximum
ƒ.avg (or ƒ.mean) the average
   
ƒ.first the first element
ƒ.last the last element
   
ƒ.count the number of not NULL elements
ƒ.countDistinct the number of distinct and not NULL elements
ƒ.col all values
ƒ.set distinct values

column name alias

Use as to rename the column in final result.

 .agg(ƒ.sum("value").as("SumOfValue")

⚠️ Mandatory if you make multiple computations for the same column (because column name is used as default)

compute on only one column

It it exists some shortcut to apply function on only one column without using .agg

$result:=$dataFrame.groupBy("letter").sums("value")

sums, counts, minimums, maximums, etc…

Get info on data frame

Is it empty?

$dataFrame.isEmpty

Data length ie. row count ?

$length:=$dataFrame.length

Get the number of cols and rows as object

$shape:=$dataFrame.shape 

Return a new dataframe with some stats on each columns

$summary:=$dataFrame.summary()

Install

Manually

Download database and put it in your Components folder or copy all code

With kaluza-cli on macOS

Inside your database root path using the terminal

# kaluza init # (if never done before)
kaluza install mesopelagique/Shoal

TODO

logo Shoal of fish

Other components

mesopelagique