Pandas DataFrame reindex 重置行索引

网友投稿 1295 2022-05-30

所属的课程名称及链接

[AI基础课程--常用框架工具]

环境信息

ModelArts

notebook - Multi-Engine 2.0 (python3)

JupyterLab - Notebook - Conda-python3

Pandas DataFrame reindex 重置行索引

* pandas 0.22.0

Pandas DataFrame reindex 重置行索引

import pandas as pd import numpy as np my_df = pd.DataFrame(data=np.arange(20).reshape(4,5), # 4*5的矩阵 index=list("acef"), # 行索引 缺少bd,一会用reindex补上 columns=list("ABCDE")) # 列索引 print("my_df\n",my_df) ''' reindex( labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None) ''' # 重置行索引 # 对于新行,填充的是NaN # 注意阅读帮助文档 print(my_df.reindex(list("abcdefg")))

my_df A B C D E a 0 1 2 3 4 c 5 6 7 8 9 e 10 11 12 13 14 f 15 16 17 18 19 A B C D E a 0.0 1.0 2.0 3.0 4.0 b NaN NaN NaN NaN NaN c 5.0 6.0 7.0 8.0 9.0 d NaN NaN NaN NaN NaN e 10.0 11.0 12.0 13.0 14.0 f 15.0 16.0 17.0 18.0 19.0 g NaN NaN NaN NaN NaN

help

help(my_df.reindex) Help on method reindex in module pandas.core.frame: reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None) method of pandas.core.frame.DataFrame instance Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False Parameters ---------- labels : array-like, optional New labels / index to conform the axis specified by 'axis' to. index, columns : array-like, optional (should be specified using keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data axis : int or str, optional Axis to target. Can be either the axis name ('index', 'columns') or number (0, 1). method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index. * default: don't fill gaps * pad / ffill: propagate last valid observation forward to next valid * backfill / bfill: use next valid observation to fill gap * nearest: use nearest valid observations to fill gap copy : boolean, default True Return a new object, even if the passed indexes are the same level : int or name Broadcast across a level, matching Index values on the passed MultiIndex level fill_value : scalar, default np.NaN Value to use for missing values. Defaults to NaN, but can be any "compatible" value limit : int, default None Maximum number of consecutive elements to forward or backward fill tolerance : optional Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation ``abs(index[indexer] - target) <= tolerance``. Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index's type. .. versionadded:: 0.17.0 .. versionadded:: 0.21.0 (list-like tolerance) Examples -------- ``DataFrame.reindex`` supports two calling conventions * ``(index=index_labels, columns=column_labels, ...)`` * ``(labels, axis={'index', 'columns'}, ...)`` We *highly* recommend using keyword arguments to clarify your intent. Create a dataframe with some fictional data. >>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror'] >>> df = pd.DataFrame({ ... 'http_status': [200,200,404,404,301], ... 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]}, ... index=index) >>> df http_status response_time Firefox 200 0.04 Chrome 200 0.02 Safari 404 0.07 IE10 404 0.08 Konqueror 301 1.00 Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the dataframe are assigned ``NaN``. >>> new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10', ... 'Chrome'] >>> df.reindex(new_index) http_status response_time Safari 404.0 0.07 Iceweasel NaN NaN Comodo Dragon NaN NaN IE10 404.0 0.08 Chrome 200.0 0.02 We can fill in the missing values by passing a value to the keyword ``fill_value``. Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword ``method`` to fill the ``NaN`` values. >>> df.reindex(new_index, fill_value=0) http_status response_time Safari 404 0.07 Iceweasel 0 0.00 Comodo Dragon 0 0.00 IE10 404 0.08 Chrome 200 0.02 >>> df.reindex(new_index, fill_value='missing') http_status response_time Safari 404 0.07 Iceweasel missing missing Comodo Dragon missing missing IE10 404 0.08 Chrome 200 0.02 We can also reindex the columns. >>> df.reindex(columns=['http_status', 'user_agent']) http_status user_agent Firefox 200 NaN Chrome 200 NaN Safari 404 NaN IE10 404 NaN Konqueror 301 NaN Or we can use "axis-style" keyword arguments >>> df.reindex(['http_status', 'user_agent'], axis="columns") http_status user_agent Firefox 200 NaN Chrome 200 NaN Safari 404 NaN IE10 404 NaN Konqueror 301 NaN To further illustrate the filling functionality in ``reindex``, we will create a dataframe with a monotonically increasing index (for example, a sequence of dates). >>> date_index = pd.date_range('1/1/2010', periods=6, freq='D') >>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]}, ... index=date_index) >>> df2 prices 2010-01-01 100 2010-01-02 101 2010-01-03 NaN 2010-01-04 100 2010-01-05 89 2010-01-06 88 Suppose we decide to expand the dataframe to cover a wider date range. >>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D') >>> df2.reindex(date_index2) prices 2009-12-29 NaN 2009-12-30 NaN 2009-12-31 NaN 2010-01-01 100 2010-01-02 101 2010-01-03 NaN 2010-01-04 100 2010-01-05 89 2010-01-06 88 2010-01-07 NaN The index entries that did not have a value in the original data frame (for example, '2009-12-29') are by default filled with ``NaN``. If desired, we can fill in the missing values using one of several options. For example, to backpropagate the last valid value to fill the ``NaN`` values, pass ``bfill`` as an argument to the ``method`` keyword. >>> df2.reindex(date_index2, method='bfill') prices 2009-12-29 100 2009-12-30 100 2009-12-31 100 2010-01-01 100 2010-01-02 101 2010-01-03 NaN 2010-01-04 100 2010-01-05 89 2010-01-06 88 2010-01-07 NaN Please note that the ``NaN`` value present in the original dataframe (at index value 2010-01-03) will not be filled by any of the value propagation schemes. This is because filling while reindexing does not look at dataframe values, but only compares the original and desired indexes. If you do want to fill in the ``NaN`` values present in the original dataframe, use the ``fillna()`` method. See the :ref:`user guide ` for more. Returns ------- reindexed : DataFrame

备注

1. 感谢老师的教学与课件

2. 欢迎各位同学一起来交流学习心得^_^

3. 沙箱实验、认证、论坛和直播,其中包含了许多优质的内容,推荐了解与学习。

Python 云学院

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:华为云DevCloud助力泰禾网络科技信息化转型
下一篇:5 种方法构建安全的 Django Admin
相关文章