Python Pandas -I
Python Pandas - I
Class 12th Chapter :- 1
Python Pandas - I
Python libraries contain a collection of builtin modules that allow us to perform many actions without writing detailed programs for it
NumPy, Pandas and Matplotlib are three well-established Python libraries for scientific
and analytical use.
Python Pandas
Python Pandas is an open-source data analysis and manipulation library used widely for data wrangling tasks in Python. It provides high-level data structures like data frames and series, which are designed to handle structured and tabular data.
PANDAS (PANel DAta) is a high-level data manipulation tool.It is very easy to import and export data using Pandas library which has a very rich set of functions.
The main author of Pandas is Wes McKinney.
Python Pandas - I
Why Pandas ?
Pandas is the most popular library in the scientific Python ecosystem for doing data analysis. Pandas is capable of many tasks including:
It can read or write in many different data formats (integer, float, double, etc.)
It can calculate in all the possible ways data is organized ie, across rows and down columns.
It can easily select subsets of data from bulky data sets and even combine multiple datasets together. It has functionality to find and fill missing data.
It allows you to apply operations to independent groups within the data.
It supports reshaping of data into different forms.
It supports advanced time-series functionality (Time series forecasting is the use of a model to predict future values based on previously observed values.)
It supports visualization by integrating matplotlib and seaborn etc. libraries.
Python Pandas - I
Installing Pandas
install Pandas from command line, we need to type in:
“pip install pandas”
Pandas can be installed only when Python is already installed on that system. The same is true for other libraries of Python.
Python Pandas - I
Pandas Data Structure
Data Sturctures refer to specialized way of storing data so as to apply a specific type of functionality on them.
A data structure is a collection of data values and operations that can be applied to that data. It enables efficient storage, retrieval and modification to the data.
Two commonly used data structures in Pandas
- Series
- DataFrame
Python Pandas - I
Difference between Series vs DataFrame
The main differences between a Series and a DataFrame in pandas are:
Dimensionality: A Series is a one-dimensional data structure, while a DataFrame is a two-dimensional data structure.
Data Representation: A Series can represent a single column or row of data, while a DataFrame represents a table with multiple rows and columns.
Labeling: Both Series and DataFrame have labeled indexes, but a DataFrame has both row and column labels, while a Series has only one-dimensional labels.
Functionality: While both Series and DataFrame offer similar functionality for data manipulation, a DataFrame provides more flexibility for handling multi-dimensional data.
Python Pandas - I
Series Data Frame
A Series is a one-dimensional array containing a sequence of values of any data type (int, float, list, string, etc) which by default have numeric data labels starting from zero. The data label associated with a particular value is called its index. We can also assign values of other data types as index.
Creating Series Objects.
There are different ways in which a series can be created in Pandas.
(A) Creation of Series from Scalar Values
A Series can be created using scalar values as shown in the example below:
import pandas as pd
S1 = pd.Series([10,20,30])
print(S1)
If we do not explicitly specify an index for the data values while creating a series, then by default indices range from 0 through N – 1.
We can also assign user-defined labels to the index
and use them to access elements of a Series.
Python Pandas - I
Series Data Frame
(B) Creation of Series from NumPy Arrays
We can create a series from a one-dimensional (1D) NumPy array, as shown below:
import numpy as np
import pandas as pd
array1 = np.array([1,2,3,4])
series3 = pd.Series(array1)
print(series3)
Python Pandas - I
Series Data Frame
(C) Creation of Series from Dictionary
Python dictionary has key: value pairs and a value can be quickly retrieved when its key is known. Dictionary keys can be used to construct an index for a
Series, as shown in the following example. Here, keys of the dictionary dict1 become indices in the series.
import pandas as pd
dict1 = {1:’One’,2:’Two’,3:’Three’,4:’Four’}
pd1 = pd.Series(dict1)
print(pd1)
Python Pandas - I
Accessing Elements of a Series
There are two common ways for accessing the elements of a series: Indexing and Slicing.
Indexing
Indexing in Series is similar to that for NumPy arrays, and is used to access elements in a series. Indexesare of two types: positional index and labelled index. Positional index takes an integer value that corresponds to its position in the series starting from 0, whereas labelled index takes any user-defined label as index.
Example :-
import pandas as pd
s1 = pd.Series([10,20,30])
s1[2]
When labels are specified, we can use labels as indices while selecting values from a Series.We can also access an element of the series using the positional index:More than one element of a series can be accessed using a list of positional integers or a list of index labels
Example :-
import pandas as pd
s1 = pd.Series([10,20,30])
s1[[1,2]]
Python Pandas - I
Accessing Elements of a Series
Slicing
Sometimes, we may need to extract a part of a series.This can be done through slicing. This is similar to slicing used with NumPy arrays.We can define which
part of the series is to be sliced by specifying the start and end parameters [start :end] with the series name.When we use positional indices for slicing, the value
at the endindex position is excluded.
It is a powerful way to retrieve subsets of data from panda’s object.The slicing works similar to String,List and Dictionary.
Syntax :-
object[Start:Stop:Step]
Example :-
import pandas as pd
animal = pd.Series([“Lion”,”Bear”,”Elephant”,”Tiger”,”Wolf”],index=[“L”,”B”,”E”,”T”,”W”])
print(animal)
print(animal[1:4])
print(animal[-3:-1])
print(animal[-4:1])
print(animal[0::-2])
print(animal[0::2])
Python Pandas - I
Attributes of Pandas Series
We can access certain properties called attributes of a series by using that property with the series name. A series is a one-dimensional labeled array capable of holding any data type.
Some of the attributes of a Pandas series include:
Values: The actual data contained within the series.
Syntax :-
<Series object>.values
Index: The labels for each element in the series.
Syntax :-
<Series object>.index
dtype: The data type of the values in the series.
shape: A tuple representing the dimensions of the series.
Syntax :-
<Series object>.dtype
size: The number of elements in the series.
Syntax :-
<Series object>.size
name: An optional name for the series.
ndim: The number of dimensions of the series (always 1).
Syntax :-
<Series object>.name
axes: A list of the series index and name.
Syntax :-
<Series object>.axes
empty: A boolean indicating whether the series is empty or not.
Syntax :-
<Series object>.empty
values_counts(): Returns a series containing counts of unique values in the original series.
Syntax :-
<Series object>.values_count()
nbytes :- Return the number of bytes in the underlying data.
Syntax :-
<Series object>.nbytes
itemsize :- Return the size of the dtype of the item of the underlying data
Syntax :-
<Series object>.itemsize
hasnans :- return true if there are NaN value ; otherwise retun false.
Syntax :-
<Series object>.hasnans
Example :-
import pandas as pd
animal = pd.Series([“Lion”,”Bear”,”Elephant”,”Tiger”,”Wolf”],index=[“L”,”B”,”E”,”T”,”W”])
print(animal)
print(“——————————–“)
print(animal.values)
print(“——————————–“)
print(animal.index)
print(“——————————–“)
print(animal.dtype)
print(“——————————–“)
print(animal.shape)
print(“——————————–“)
print(animal.nbytes)
print(“——————————–“)
print(animal.ndim)
print(“——————————–“)
print(animal.size)
print(“——————————–“)
print(animal.hasnans)
print(“——————————–“)
print(animal.empty)
print(“——————————–“)
Python Pandas - I
Methods of Series Function
In Pandas, a Series is an object that contains data and its index. A Pandas Series provides a variety of methods for data manipulation and analysis. Some of the commonly used methods of a Series in Pandas include:
head(): Returns the first n rows of the series.By Default,it return the first five records of the series but we can specify the number records.
Syntax :-
<Series object>.head(n)
tail(): Returns the last n rows of the series .By Default,it return the last five records of the series but we can specify the number records.
Syntax :-
<Series object>.tail(n)
Count() : It will count the not nullvalues in the Series.
Syntax :-
<Series object>.Count()
Example :-
import pandas as pd
animal = pd.Series([“Lion”,”Bear”,”Elephant”,”Tiger”,”Wolf”],index=[“L”,”B”,”E”,”T”,”W”])
print(animal)
print(“——————————“)
print(animal.head(3))
print(“——————————“)
print(animal.tail(2))
print(“——————————“)
print(animal.count())
Python Pandas - I
Mathematical Operations on Series
If we perform basic mathematical operations like addition, subtraction, multiplication, division, etc., on two NumPy arrays,the operation is done on each corresponding pair of elements. Similarly, we can perform mathematical operations on two series in Pandas.While performing mathematical operations on series,index matching is implemented and all missing values are filled in with NaN by default.
There are various mathematical operations that can be performed on series, depending on their convergence or divergence properties.
Addition of two Series
It can be done in two ways. In the first method, two series are simply added together,the detailed values that were matched while performing the addition. Note here that the output of addition is NaN if one of the elements or both elements have no value.
Syntax :- pd3 = pd1.add(pd2,fill_value=0)
Example :-
import pandas as pd
pd1 = pd.Series([10,20,30,40,50],index=[“A”,”B”,”C”,”D”,”E”])
print(pd1)
pd2 = pd.Series([5,10,15,20,25],index=[“C”,”D”,”E”,”F”,”G”])
print(pd2)
pd3 = pd1 + pd2
print(pd3)
pd4 = pd1.add(pd2,fill_value=0)
print(pd4)
Substraction of two Series
It can be done in two ways. In the first method, two series are simply substract together,the detailed values that were matched while performing the addition. Note here that the output of substract is NaN if one of the elements or both elements have no value.
Syntax :- pd3 = pd1.sub(pd2,fill_value=0)
Example :-
import pandas as pd
pd1 = pd.Series([10,20,30,40,50],index=[“A”,”B”,”C”,”D”,”E”])
print(pd1)
pd2 = pd.Series([5,10,15,20,25],index=[“C”,”D”,”E”,”F”,”G”])
print(pd2)
pd3 = pd1 – pd2
print(pd3)
pd4 = pd1.sub(pd2,fill_value=0)
print(pd4)
Multiplication of two Series
It can be done in two ways. In the first method, two series are simply multiplication together,the detailed values that were matched while performing the multiplication. Note here that the output of addition is NaN if one of the elements or both elements have no value.
Syntax :- pd3 = pd1.mul(pd2,fill_value=0)
Example :-
import pandas as pd
pd1 = pd.Series([10,20,30,40,50],index=[“A”,”B”,”C”,”D”,”E”])
print(pd1)
pd2 = pd.Series([5,10,15,20,25],index=[“C”,”D”,”E”,”F”,”G”])
print(pd2)
pd3 = pd1 * pd2
print(pd3)
pd4 = pd1.mul(pd2,fill_value=0)
print(pd4)
Division of two Series
It can be done in two ways. In the first method, two series are simply substract together,the detailed values that were matched while performing the addition. Note here that the output of addition is NaN if one of the elements or both elements have no value.
Syntax :- pd3 = pd1.add(pd2,fill_value=0)
Example :-
import pandas as pd
pd1 = pd.Series([10,20,30,40,50],index=[“A”,”B”,”C”,”D”,”E”])
print(pd1)
pd2 = pd.Series([5,10,15,20,25],index=[“C”,”D”,”E”,”F”,”G”])
print(pd2)
pd3 = pd1 / pd2
print(pd3)
pd4 = pd1.div(pd2,fill_value=0)
print(pd4)
Python Pandas - I
DataFrame in Pandas
A DataFrame is a two-dimensional labelled data structure like a table of MySQL. It contains rows and columns, and therefore has both a row and column index. Each column can have a different type of value such as numeric, string, boolean, etc., as in tables of a database.
A two dimensional array is an array in which each element is itself an array. For instance,an array A[m][n] is an M by N table with M rows and N columns containing M x N elements.
It has two indexs or we can say that axes – a row index (axis = 0) and a column index (axis = 1) . It is like a spreadsheet where each value is identifiable with the combination of row index and colum index. The row index is known as index in general and the column index is called the column-name.
It can easily change its values ie It is value Mutuable . You can also add or delete row/columns in a DataFrame.In other word it is size Mutable.
A DataFrame consists of three main components: the data, the row index, and the column index. The data is a collection of one or more columns, each of which can be of a different data type such as integers, floating-point numbers, or strings. The row index is a sequence of labels that identify each row, while the column index is a sequence of labels that identify each column.
DataFrames provide a wide range of functionality for data manipulation, including filtering, grouping, aggregating, pivoting, merging, and sorting. They also support many common data analysis tasks, such as calculating summary statistics, performing statistical tests, and visualizing data.
Syntax :-
import pandas as pd
pd1 = pd.DataFrame([])
print(pd1)
Example :-
import pandas as pd
dict1 = {“Rollno”:[1,2,3,4,5,6],
“Name”:[“Hardik”,”Prakash”,”Manan”,”Mohit”,”Nishant”,”Yash”],
“Marks”:[85,78,96,54,48,66],
“Grade”:[“A”,”B”,”A”,”C”,”C”,”B”]}
pd1 = pd.DataFrame(dict1)
print(pd1)
Python Pandas - I
Creating and Displaying a DataFrame
A two dimensional dictionary having items as (key:value) where value part is a data Strucuter of any type .
- Creation of an empty DataFrame.
- Creation of DataFrame from NumPy ndarrays
- Creation of DataFrame from List of Dictionaries
- Creation of DataFrame from Dictionary of Lists
- Creation of DataFrame from Series
- Creation of DataFrame from Dictionary of Series
Creating of an empty DataFrame.
An empty DataFrame can be created as follows
Example :-
import pandas as pd
dFrameEmt = pd.DataFrame()
dFrameEmt
Python Pandas - I
B. Creation of DataFrame from NumPy ndarrays
We can create a DataFrame using more than one ndarrays
Example :-
import numpy as np
array1 = np.array([10,20,30])
array2 = np.array([100,200,300])
array3 = np.array([-10,-20,-30, -40])
dFrame4 = pd.DataFrame(array1)
dFrame4
Python Pandas - I
C. Creation of DataFrame from List of Dictionaries
We can create DataFrame from a list of Dictionaries
Create list of dictionaries
listDict = [{‘a’:10, ‘b’:20}, {‘a’:5,’b’:10, ‘c’:20}]
dFrameListDict = pd.DataFrame(listDict)
dFrameListDict
the dictionary keys are taken as column labels, and the values corresponding to each key are taken as rows. There will be as many rows as the number of dictionaries present in the list.
Python Pandas - I
D. Creation of DataFrame from Dictionary of Lists
DataFrames can also be created from a dictionary of the following dictionary consisting of the keys ‘State’, ‘GArea’ (geographical area) and ‘VDF’ (very dense forest) and the correspondinglists.
dictForest = {‘State’: [‘Assam’, ‘Delhi’,’Kerala’],
‘GArea’: [78438, 1483, 38852] ,
‘VDF’ : [2797, 6.72,1663]}
dFrameForest= pd.DataFrame(dictForest)
dFrameForest
Or you can add column to the DataFrame.
dFrameForest1 = pd.DataFrame(dictForest,
columns = [‘State’,’VDF’, ‘GArea’])
dFrameForest1
Python Pandas - I
E. Creation of DataFrame from Series
DataFrames can also be created from a dictionary of the following dictionary consisting of the keys ‘State’, ‘GArea’ (geographical area) and ‘VDF’ (very dense forest) and the corresponding lists.
dictForest = {‘State’: [‘Assam’, ‘Delhi’,’Kerala’],
‘GArea’: [78438, 1483, 38852] ,
‘VDF’ : [2797, 6.72,1663]}
dFrameForest= pd.DataFrame(dictForest)
dFrameForest
Or you can add column to the DataFrame.
dFrameForest1 = pd.DataFrame(dictForest,
columns = [‘State’,’VDF’, ‘GArea’])
dFrameForest1
Python Pandas - I
Data Frame Attribte
index
Columns
axes
dtypes
sizes
shape
values
empty
ndim
T
The index of the Data Frame.
The column labels of the DataFrame
Return a list representing both the axis (axis 0 ex index , axis 1 ex columns)
Return the dtype of data in the Data Frame.
Return an int representing the number of element in this object
Return a tuple representation the dimensionality of the DataFrame
Return a Numpy representation of the DataFrame
Indicator whether DataFrame is Empty
Return an int representating the number of axes/array dimension.
Transpose index and columns.
Python Pandas - I
Example of Data Frame Attribute
dictForest = {‘State’: [‘Assam’, ‘Delhi’,’Kerala’],
‘GArea’: [78438, 1483, 38852] ,
‘VDF’ : [2797, 6.72,1663]}
df1= pd.DataFrame(dictForest)
df1
a) df1.index
-> Index([1,2,3],dtype=”Int”)
b) df1.columns
-> index([‘State’,’GArea’,’VDF’],dtype=”object”)
c) df1.axis
-> [Index([1,2,3],dtype=”Int”),
Index([‘State’,’GArea’,’VDF’],dtype=”object”)]
d) df1.dtype
-> State :- object
->GArea :- int64
-> VDF :- float64
Python Pandas - I
Operations on rows and columns in DataFrames
We can perform some basic operations on rows and columns of a DataFrame like selection, deletion, addition, and renaming.
Mohit Nishant Yash
Maths 78 78 85
Science 88 92 96
Hindi 90 85 78
(a) Adding a New Column to a DataFrame
-> We can easily add a new column to a DataFrame. In
order to add a new column for another student ‘Preeti’
example :- pd1[‘Hardik’]=[89,90,85]
Python Pandas - I
E. Creation of DataFrame from Series
DataFrames can also be created from a dictionary of the following dictionary consisting of the keys ‘State’, ‘GArea’ (geographical area) and ‘VDF’ (very dense forest) and the corresponding lists.
dictForest = {‘State’: [‘Assam’, ‘Delhi’,’Kerala’],
‘GArea’: [78438, 1483, 38852] ,
‘VDF’ : [2797, 6.72,1663]}
dFrameForest= pd.DataFrame(dictForest)
dFrameForest
Or you can add column to the DataFrame.
dFrameForest1 = pd.DataFrame(dictForest,
columns = [‘State’,’VDF’, ‘GArea’])
dFrameForest1