Pandas frame creation
Create pandas frame using dictionary
Using normal dictionary data structure
Note that using the normal dictionary will not be able to ensure the order of columns in pandas frame.
Example below:
1 2 3 4
| import pandas as pd from tabulate import tabulate def markdown_pd(df): print(tabulate(df, tablefmt="pipe", headers="keys", showindex=True))
|
1 2 3
| new_dict = {} new_dict['foo'] = [1, 2, 3] new_dict['bar'] = [4, 5, 6]
|
1
| pdframe_normal_dict = pd.DataFrame(new_dict)
|
1
| markdown_pd(pdframe_normal_dict)
|
|
bar |
foo |
0 |
4 |
1 |
1 |
5 |
2 |
2 |
6 |
3 |
As you can see, the pandas frame does not have the order of columns that we want it to be.
In order to make it the correct order, we have to do:
1
| pdframe_normal_dict.columns = ['foo', 'bar']
|
Because list has an order, now we can get:
1
| markdown_pd(pdframe_normal_dict )
|
|
foo |
bar |
0 |
4 |
1 |
1 |
5 |
2 |
2 |
6 |
3 |
Using orderedDict data structure
1
| from collections import OrderedDict
|
Order dict has the order of the keys, so the pandas frame columns order will be ensured.
1
| order_dict = OrderedDict()
|
1
| order_dict['foo'] = [1, 2, 3]
|
1
| order_dict['bar'] = [4, 5, 6]
|
OrderedDict([('foo', [1, 2, 3]), ('bar', [4, 5, 6])])
1
| pdframe_order_dict = pd.DataFrame(order_dict)
|
1
| markdown_pd(pdframe_order_dict)
|
|
foo |
bar |
0 |
1 |
4 |
1 |
2 |
5 |
2 |
3 |
6 |
As you can see, the order of column is what we want!
Create pandas frame from lists
Create a column list: [‘foo’, ‘bar’]
Create a numpy array np.array([[1, 2, 3], [4, 5, 6]])
1
| columns = ['foo', 'bar']
|
1
| data = np.array([[1, 4], [2, 5], [3, 6]])
|
(3, 2)
1
| new_pdframe = pd.DataFrame(data=data, columns=columns)
|
1
| markdown_pd(new_pdframe)
|
|
foo |
bar |
0 |
1 |
4 |
1 |
2 |
5 |
2 |
3 |
6 |
insert column
unexplicit insert without defining column index
1
| markdown_pd(new_pdframe)
|
|
foo |
bar |
0 |
1 |
4 |
1 |
2 |
5 |
2 |
3 |
6 |
1
| new_pdframe.loc[:, 'foz'] = [7, 8, 9]
|
1
| markdown_pd(new_pdframe)
|
|
foo |
bar |
foz |
0 |
1 |
4 |
7 |
1 |
2 |
5 |
8 |
2 |
3 |
6 |
9 |
in this case, the new column will be added at the most right of the data frame
explicit insert with defining column index
Let’s try insert the new column in the middle
1
| new_pdframe = pd.DataFrame(data=data, columns=columns)
|
1
| markdown_pd(new_pdframe)
|
|
foo |
bar |
0 |
1 |
4 |
1 |
2 |
5 |
2 |
3 |
6 |
1
| new_pdframe.insert(1,'foz', [7, 8, 9])
|
1
| markdown_pd(new_pdframe)
|
|
foo |
foz |
bar |
0 |
1 |
7 |
4 |
1 |
2 |
8 |
5 |
2 |
3 |
9 |
6 |
Note that you still can insert the new column at the end by the corresponding index.
1
| new_pdframe = pd.DataFrame(data=data, columns=columns)
|
1
| markdown_pd(new_pdframe)
|
|
foo |
bar |
0 |
1 |
4 |
1 |
2 |
5 |
2 |
3 |
6 |
1
| new_pdframe.insert(2,'foz', [7, 8, 9])
|
1
| markdown_pd(new_pdframe)
|
|
foo |
bar |
foz |
0 |
1 |
4 |
7 |
1 |
2 |
5 |
8 |
2 |
3 |
6 |
9 |
insert row
Inserting a row is not as easy as inserting a column
In order to make it more clear, let’s set the index of the pandas frame to some new values instead of 0, 1, 2.
1
| new_pdframe = pd.DataFrame(data=data, columns=columns, index = ['a', 'b', 'c'])
|
1
| markdown_pd(new_pdframe)
|
|
foo |
bar |
a |
1 |
4 |
b |
2 |
5 |
c |
3 |
6 |
As you can see, now the indices are characters instead of numbers.
insert by loc[new_index_value]
by setting a new index value, the new row will be appended at the end of the dataframe.
Note: the index must not be existing already!
Index(['a', 'b', 'c'], dtype='object')
1
| new_pdframe.loc['d'] = [4, 7]
|
1
| markdown_pd(new_pdframe)
|
|
foo |
bar |
a |
1 |
4 |
b |
2 |
5 |
c |
3 |
6 |
d |
4 |
7 |
insert by appending a row in a specific index position
In this case you wanna to insert a pandas frame, but to a specific position inside the pandas frame
1 2 3 4 5 6 7
| def insert_row(idx, df, df_insert): dfA = df.iloc[:idx, ] dfB = df.iloc[idx:, ]
df = dfA.append(df_insert).append(dfB)
return df
|
1
| new_pdframe = pd.DataFrame(data=data, columns=columns, index = ['a', 'b', 'c'])
|
1
| markdown_pd(new_pdframe)
|
|
foo |
bar |
a |
1 |
4 |
b |
2 |
5 |
c |
3 |
6 |
1
| appended_pdframe = pd.DataFrame(data=np.array([[5, 5], [6, 6]]), columns=columns, index = ['d', 'e'])
|
1
| markdown_pd(appended_pdframe)
|
Now let’s try adding it between b and c.
1
| updated_frame = insert_row(2, new_pdframe, appended_pdframe)
|
1
| markdown_pd(updated_frame)
|
|
foo |
bar |
a |
1 |
4 |
b |
2 |
5 |
d |
5 |
5 |
e |
6 |
6 |
c |
3 |
6 |
insert by pd.concat
In this case, the frame will be appended at the end of the frame.
1
| updated_frame = pd.concat([new_pdframe, appended_pdframe])
|
1
| markdown_pd(updated_frame)
|
|
foo |
bar |
a |
1 |
4 |
b |
2 |
5 |
c |
3 |
6 |
d |
5 |
5 |
e |
6 |
6 |