arche.tools.pandas module¶
-
arche.tools.pandas.
expand_column
(df: pandas.core.frame.DataFrame, column: str) → pandas.core.frame.DataFrame¶
-
arche.tools.pandas.
flatten_df
(df: pandas.core.frame.DataFrame, i: int = 0, columns_map: Optional[Dict[str, str]] = None, p_bar: Optional[tqdm.tqdm_notebook] = None) → Tuple[pandas.core.frame.DataFrame, Dict[str, str]]¶ Expand lists and dicts to new columns named after list element number or dict key and containing respective cell values. If new name conflicts with an existing column, a short hash is used. Almost as fast as json_normalize but supports lists.
- Parameters
df – a dataframe to expand
i – start index of columns slice, since there’s no need to iterate
over completely expanded column (twice) –
columns_map – a dict with old name references {new_name: old}
p_bar – a progress bar
- Returns
A flat dataframe with new columns from expanded lists and dicts and a columns map dict with old name references {new_name: old}
Examples:
>>> df = pd.DataFrame({"links": [[{"im": "http://www.im.com/illinoi"}, ... {"ITW website": "http://www.itw.com"}]]})
>>> flat_df, cols_map = flatten_df(df) >>> flat_df links_0_im links_1_ITW website 0 http://www.im.com/illinoi http://www.itw.com
>>> cols_map {'links_0_im': 'links', 'links_1_ITW website': 'links'}