You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The split-apply-combine pattern to group data, and apply transformations on the original data based on aggregate functions on the grouped data (sum, count, max for each group for example) on the original dataframe is extremely useful for datascience.
Regarding the syntax: Python's pandas and R's dplyr uses a custom syntax. Julia's DataFramesMeta uses LINQ.
I believe LINQ approach is much more powerful in term of extensibility and could have a very nice Pipe syntax in Nim. Also this could expand well to multiple backends (SQL, Hadoop, Feather ...)
Here are two sample Scikit Transformers to compute the Ticket frequency on the Kaggle Titanic dataset to showcase LINQ split-apply-combine vs Pandas
First of all: This is awesome! I'm willing to test the hell out-of-this library.
This is related to: #13
The split-apply-combine pattern to group data, and apply transformations on the original data based on aggregate functions on the grouped data (sum, count, max for each group for example) on the original dataframe is extremely useful for datascience.
Regarding the syntax: Python's pandas and R's dplyr uses a custom syntax. Julia's DataFramesMeta uses LINQ.
I believe LINQ approach is much more powerful in term of extensibility and could have a very nice Pipe syntax in Nim. Also this could expand well to multiple backends (SQL, Hadoop, Feather ...)
Here are two sample Scikit Transformers to compute the Ticket frequency on the Kaggle Titanic dataset to showcase LINQ split-apply-combine vs Pandas
Python
Julia -- Piping is done with |>
Full codes, in Python, and Julia
Here is a link to a LINQ idea on Nim's forum as well
The text was updated successfully, but these errors were encountered: