More Spark I/O. Parquet is a column-based file format that is designed to store tabular data, just like a Spark DataFrame. Because it's column-oriented, Spark can read only some columns from the files in a very efficient way. More Spark I/O. It might be sensible to think of Parquet as a very efficient intermediate format.
Column: DATE and msg=Singleton array array(1444424400.0) cannot be considered a valid collection. Is there any way to avoid numpy et all? As example, I want to leverage on pandas_ml library.