arche.tools.pandas module

arche.tools.pandas.expand_column(df: pandas.core.frame.DataFrame, column: str) → pandas.core.frame.DataFrame
arche.tools.pandas.flatten_df(df: pandas.core.frame.DataFrame, i: int = 0, columns_map: Optional[Dict[str, str]] = None, p_bar: Optional[tqdm.tqdm_notebook] = None) → Tuple[pandas.core.frame.DataFrame, Dict[str, str]]

Expand lists and dicts to new columns named after list element number or dict key and containing respective cell values. If new name conflicts with an existing column, a short hash is used. Almost as fast as json_normalize but supports lists.

Parameters
  • df – a dataframe to expand

  • i – start index of columns slice, since there’s no need to iterate

  • over completely expanded column (twice) –

  • columns_map – a dict with old name references {new_name: old}

  • p_bar – a progress bar

Returns

A flat dataframe with new columns from expanded lists and dicts and a columns map dict with old name references {new_name: old}

Examples:

>>> df = pd.DataFrame({"links": [[{"im": "http://www.im.com/illinoi"},
...                               {"ITW website": "http://www.itw.com"}]]})
>>> flat_df, cols_map = flatten_df(df)
>>> flat_df
                  links_0_im links_1_ITW website
0  http://www.im.com/illinoi  http://www.itw.com
>>> cols_map
{'links_0_im': 'links', 'links_1_ITW website': 'links'}