Nieuws

Hello, I have a very large dataset (10's of millions of rows) stored on a partitioned parquet dataset on disk. I load this dataset into memory into a pyarrow.Table, and drop all columns except one, ...
I was looking into how to convert dataframes to numpy arrays so that both column dtypes and names would be retained, preferably in an efficient way so that memory is not duplicated while doing this.