Data structure -DataFrame |Pandas tutorial
Data Structure – DataFrames
Before diving into this article, I would suggest reading my previous articles part-1 and part-2. Through this article,
You will learn the following thing about the data structure- DataFrame of Pandas in brief.
Topics that we will cover here are:
• What is the data frame?
• What is the data frame constructor?
• What are the data inputs we can use to create a data frame?
• Different ways of creating a data frame.
What is a data frame?
• DataFrame is a container of the number of the series data structure of pandas.
• Dataframe has data aligned in a tabular way (rows and columns).
• It is a 2Dimensional data structure of pandas.
• It is mutable in size.
• Column data can be of different data types.
• Arithmetic operations can be performed on rows and columns as well.
What is the data frame constructor?
Syntax:
pandas.DataFrame( data, index, columns, dtype, copy)
Parameters
data: It can be a ndarray, series, map, lists, dict, constants, and also another DataFrame.
Index: For the row labels, the Index to be used for the resulting frame is Optional and by Default np.arange(n) if the index is not passed.
Columns: The optional default syntax is - np.arange(n). This is only true if the index is not passed.
Dtype: Data type of each column.
Copy: This is used for copying of data, the default is False.
What are the data inputs we can use to create a dataframe?
A pandas dataframe can be created using different data inputs, all those inputs are listed below:
• Lists
• dict
• Series
• Numpy ndarrays
• Another DataFrame
Different ways of creating a dataframe.
A). Creating an Empty DataFrame?
Code:
#import the pandas' library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print(df)
output:
Empty DataFrame
Columns: []
Index: []
B). Creating a DataFrame from Lists
#using single list
code:
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)
output:
0
0 20
1 30
2 40
3 50
#using multiple list
Code:
#import the pandas library and aliasing as pd
import pandas as pd
data = [['SID', 20],['MONA' ,30],['BOB', 40],['SOHAN' ,50]]
df = pd.DataFrame(data, columns=['Name' ,'Age'])
print(df)
output:
Name Age
0 SID 20
1 MONA 30
2 BOB 40
3 SOHAN 50
#create dataframe using dict with default index.
Code:
import pandas as pd
data = {'Name':['Rohan', 'Sohan', 'Sid', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data,index=['rank1','rank2','rank3','rank4'])
print(df)
#output:
Name Age
rank1 Rohan 28
rank2 Sohan 34
rank3 Sid 29
rank4 Ricky 42
D). Create a DataFrame from List of Dictionaries.
By default dictionary, keys are taken as column names.
# create a DataFrame by passing a list of dictionaries.
Code:
import pandas as pd
data = [{'apple': 1, 'banana': 2},{'pear': 5, 'guava': 10, 'grapes': 20}]
df = pd.DataFrame(data)
print(df)
#output:
apple banana pear guava grapes
0 1.0 2.0 NaN NaN NaN
1 NaN NaN 5.0 10.0 20.0
Note − In missing areas NaN(Not a Number) is added..
#create a DataFrame by passing a list of dictionaries and the row indices.
Code:
import pandas as pd
data = [{'apple': 1, 'banana': 2},{'pear': 5, 'guava': 10, 'grapes': 20}]
df = pd.DataFrame(data,index=['first', 'second'])
print(df)
#output:
apple banana pear guava grapes
first 1.0 2.0 NaN NaN NaN
second NaN NaN 5.0 10.0 20.0
#create a DataFrame with a list of dictionaries, row indices, and column indices.
code:
import pandas as pd
data = [{'a': 100, 'b': 200},{'a': 5, 'b': 10, 'c': 20}]
#With two column indices, values the same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
#With two column indices with one index with another name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
print(df2)
#output:
a b
first 100 200
second 5 10
a b1
first 100 NaN
second 5 NaN
E). Create a DataFrame from Dict of Series
Code:
import pandas as pd
d = {'x1' : pd.Series([100, 200, 300], index=['a', 'b', 'c']),
'x2' : pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df)
#output
x1 x2
a 100.0 10
b 200.0 20
c 300.0 30
d NaN 40