Class12thCBSEIP

Python Pandas -I

Python Pandas - I

Class 12th Chapter :- 1

Python Pandas - I

Python libraries contain a collection of builtin modules that allow us to perform many actions without writing detailed programs for it

NumPy, Pandas and Matplotlib are three well-established Python libraries for scientific
and analytical use.

Python Pandas

Python Pandas is an open-source data analysis and manipulation library used widely for data wrangling tasks in Python. It provides high-level data structures like data frames and series, which are designed to handle structured and tabular data.

PANDAS (PANel DAta) is a high-level data manipulation tool.It is very easy to import and export data using Pandas library which has a very rich set of functions.

The main author of Pandas is Wes McKinney.

 

Python Pandas - I

Why Pandas ? 

  • Pandas is the most popular library in the scientific Python ecosystem for doing data analysis. Pandas is capable of many tasks including:
  • It can read or write in many different data formats (integer, float, double, etc.)
  • It can calculate in all the possible ways data is organized ie, across rows and down columns.
  • It can easily select subsets of data from bulky data sets and even combine multiple datasets together. It has functionality to find and fill missing data.
  • It allows you to apply operations to independent groups within the data.
  • It supports reshaping of data into different forms.
  • It supports advanced time-series functionality (Time series forecasting is the use of a model to predict future values based on previously observed values.)
  • It supports visualization by integrating matplotlib and seaborn etc. libraries.

Python Pandas - I

Installing Pandas
install Pandas from command line, we need to type in:
          “pip install pandas”
Pandas can be installed only when Python is already installed on that system. The same is true for other libraries of Python.

Python Pandas - I

Pandas Data Structure 

Data Sturctures refer to specialized way of storing data so as to apply a specific type of  functionality on them.

A data structure is a collection of data values and operations that can be applied to that data. It enables efficient storage, retrieval and modification to the data.

Two commonly used data structures in Pandas

  • Series 
  • DataFrame

Python Pandas - I

Difference between Series vs DataFrame

The main differences between a Series and a DataFrame in pandas are:

  1. Dimensionality: A Series is a one-dimensional data structure, while a DataFrame is a two-dimensional data structure.

  2. Data Representation: A Series can represent a single column or row of data, while a DataFrame represents a table with multiple rows and columns.

  3. Labeling: Both Series and DataFrame have labeled indexes, but a DataFrame has both row and column labels, while a Series has only one-dimensional labels.

  4. Functionality: While both Series and DataFrame offer similar functionality for data manipulation, a DataFrame provides more flexibility for handling multi-dimensional data.

 

 

Python Pandas - I

Series Data Frame

A Series is a one-dimensional array containing a sequence of values of any data type (int, float, list, string, etc) which by default have numeric data labels starting from zero. The data label associated with a particular value is called its index. We can also assign values of other data types as index.

Creating Series Objects.

There are different ways in which a series can be created in Pandas.

(A) Creation of Series from Scalar Values

A Series can be created using scalar values as shown in the example below:

import pandas as pd 

S1 = pd.Series([10,20,30])

print(S1)

If we do not explicitly specify an index for the data values while creating a series, then by default indices range from 0 through N – 1.

We can also assign user-defined labels to the index
and use them to access elements of a Series.

Python Pandas - I

Series Data Frame

(B) Creation of Series from NumPy Arrays

We can create a series from a one-dimensional (1D) NumPy array, as shown below:

When index labels are passed with the array, then the length of the index and array must be of the same size, else it will result in a ValueError.
 
Example :- 
 

import numpy as np

import pandas as pd 

array1 = np.array([1,2,3,4])

series3 = pd.Series(array1)

print(series3) 

Python Pandas - I

Series Data Frame

(C) Creation of Series from Dictionary

Python dictionary has key: value pairs and a value can be quickly retrieved when its key is known. Dictionary keys can be used to construct an index for a
Series, as shown in the following example. Here, keys of the dictionary dict1 become indices in the series.

import pandas as pd
dict1 = {1:’One’,2:’Two’,3:’Three’,4:’Four’}
pd1 = pd.Series(dict1)
print(pd1)

 

Python Pandas - I

 Accessing Elements of a Series

There are two common ways for accessing the elements of a series: Indexing and Slicing.

Indexing

Indexing in Series is similar to that for NumPy arrays, and is used to access elements in a series. Indexesare of two types: positional index and labelled index. Positional index takes an integer value that corresponds to its position in the series starting from 0, whereas labelled index takes any user-defined label as index.
Example :- 
import pandas as pd 
s1 = pd.Series([10,20,30])
s1[2]
When labels are specified, we can use labels as indices while selecting values from a Series.We can also access an element of the series using the positional index:More than one element of a series can be accessed using a list of positional integers or a list of index labels
Example :- 
import pandas as pd 
s1 = pd.Series([10,20,30])
s1[[1,2]]

 

 

Python Pandas - I

 Accessing Elements of a Series

Slicing

Sometimes, we may need to extract a part of a series.This can be done through slicing. This is similar to slicing used with NumPy arrays.We can define which
part of the series is to be sliced by specifying the start and end parameters [start :end] with the series name.When we use positional indices for slicing, the value
at the endindex position is excluded.

It is a powerful way to retrieve subsets of data from panda’s object.The slicing works similar to String,List and Dictionary.

Syntax :-

object[Start:Stop:Step]

Example :-

import pandas as pd
animal = pd.Series([“Lion”,”Bear”,”Elephant”,”Tiger”,”Wolf”],index=[“L”,”B”,”E”,”T”,”W”])
print(animal)
print(animal[1:4])
print(animal[-3:-1])
print(animal[-4:1])
print(animal[0::-2])
print(animal[0::2])

 

 

 

Python Pandas - I

Attributes of Pandas Series

We can access certain properties called attributes of a series by using that property with the series name. A series is a one-dimensional labeled array capable of holding any data type.

Some of the attributes of a Pandas series include:

Values: The actual data contained within the series.

Syntax :- 

<Series object>.values

Index: The labels for each element in the series.

Syntax :-

<Series object>.index

dtype: The data type of the values in the series.
shape: A tuple representing the dimensions of the series.

Syntax :-

<Series object>.dtype

size: The number of elements in the series.

Syntax :-

<Series object>.size

name: An optional name for the series.
ndim: The number of dimensions of the series (always 1).

Syntax :-

<Series object>.name

axes: A list of the series index and name.

Syntax :-

<Series object>.axes

empty: A boolean indicating whether the series is empty or not.

Syntax :-

<Series object>.empty

values_counts(): Returns a series containing counts of unique values in the original series.

Syntax :-

<Series object>.values_count()

nbytes :- Return the number of bytes in the underlying data.

Syntax :-

<Series object>.nbytes

itemsize :- Return the size of the dtype of the item of the underlying data

Syntax :-

<Series object>.itemsize

hasnans :- return true if there are NaN value ; otherwise retun false.

Syntax :-

<Series object>.hasnans

Example :- 

import pandas as pd
animal = pd.Series([“Lion”,”Bear”,”Elephant”,”Tiger”,”Wolf”],index=[“L”,”B”,”E”,”T”,”W”])
print(animal)
print(“——————————–“)
print(animal.values)
print(“——————————–“)
print(animal.index)
print(“——————————–“)
print(animal.dtype)
print(“——————————–“)
print(animal.shape)
print(“——————————–“)
print(animal.nbytes)
print(“——————————–“)
print(animal.ndim)
print(“——————————–“)
print(animal.size)
print(“——————————–“)
print(animal.hasnans)
print(“——————————–“)
print(animal.empty)
print(“——————————–“)

Python Pandas - I

Methods of Series Function

In Pandas, a Series is an object that contains data and its index. A Pandas Series provides a variety of methods for data manipulation and analysis. Some of the commonly used methods of a Series in Pandas include:

head(): Returns the first n rows of the series.By Default,it return the first five records of the series but we can specify the number records.
Syntax :-

<Series object>.head(n)

tail(): Returns the last n rows of the series .By Default,it return the last five records of the series but we can specify the number records.

Syntax :- 

<Series object>.tail(n)

Count() : It will count the not nullvalues in the Series.

Syntax :- 

<Series object>.Count()

Example :- 

import pandas as pd
animal = pd.Series([“Lion”,”Bear”,”Elephant”,”Tiger”,”Wolf”],index=[“L”,”B”,”E”,”T”,”W”])
print(animal)
print(“——————————“)
print(animal.head(3))
print(“——————————“)
print(animal.tail(2))
print(“——————————“)
print(animal.count())

 

 

Python Pandas - I

Mathematical Operations on Series

If we perform basic mathematical operations like addition, subtraction, multiplication, division, etc., on two NumPy arrays,the operation is done on each corresponding pair of elements. Similarly, we can perform mathematical operations on two series in Pandas.While performing mathematical operations on series,index matching is implemented and all missing values are filled in with NaN by default.

There are various mathematical operations that can be performed on series, depending on their convergence or divergence properties.

Addition of two Series

It can be done in two ways. In the first method, two series are simply added together,the detailed values that were matched while performing the addition. Note here that the output of addition is NaN if one of the elements or both elements have no value.

Syntax :- pd3 = pd1.add(pd2,fill_value=0)

Example :-
import pandas as pd
pd1 = pd.Series([10,20,30,40,50],index=[“A”,”B”,”C”,”D”,”E”])
print(pd1)
pd2 = pd.Series([5,10,15,20,25],index=[“C”,”D”,”E”,”F”,”G”])
print(pd2)
pd3 = pd1 + pd2
print(pd3)
pd4 = pd1.add(pd2,fill_value=0)
print(pd4)

Substraction of two Series

It can be done in two ways. In the first method, two series are simply substract together,the detailed values that were matched while performing the addition. Note here that the output of substract is NaN if one of the elements or both elements have no value.

Syntax :- pd3 = pd1.sub(pd2,fill_value=0)

Example :-
import pandas as pd
pd1 = pd.Series([10,20,30,40,50],index=[“A”,”B”,”C”,”D”,”E”])
print(pd1)
pd2 = pd.Series([5,10,15,20,25],index=[“C”,”D”,”E”,”F”,”G”])
print(pd2)
pd3 = pd1 – pd2
print(pd3)
pd4 = pd1.sub(pd2,fill_value=0)
print(pd4)

Multiplication of two Series

It can be done in two ways. In the first method, two series are simply multiplication together,the detailed values that were matched while performing the multiplication. Note here that the output of addition is NaN if one of the elements or both elements have no value.

Syntax :- pd3 = pd1.mul(pd2,fill_value=0)

Example :-
import pandas as pd
pd1 = pd.Series([10,20,30,40,50],index=[“A”,”B”,”C”,”D”,”E”])
print(pd1)
pd2 = pd.Series([5,10,15,20,25],index=[“C”,”D”,”E”,”F”,”G”])
print(pd2)
pd3 = pd1 * pd2
print(pd3)
pd4 = pd1.mul(pd2,fill_value=0)
print(pd4)

Division of two Series

It can be done in two ways. In the first method, two series are simply substract together,the detailed values that were matched while performing the addition. Note here that the output of addition is NaN if one of the elements or both elements have no value.

Syntax :- pd3 = pd1.add(pd2,fill_value=0)

Example :-
import pandas as pd
pd1 = pd.Series([10,20,30,40,50],index=[“A”,”B”,”C”,”D”,”E”])
print(pd1)
pd2 = pd.Series([5,10,15,20,25],index=[“C”,”D”,”E”,”F”,”G”])
print(pd2)
pd3 = pd1 / pd2
print(pd3)
pd4 = pd1.div(pd2,fill_value=0)
print(pd4)

 

 

 

Python Pandas - I

DataFrame in Pandas

A DataFrame is a two-dimensional labelled data structure like a table of MySQL. It contains rows and columns, and therefore has both a row and column index. Each column can have a different type of value such as numeric, string, boolean, etc., as in tables of a database.

A two dimensional array is an array in which each element is itself an array. For instance,an array A[m][n] is an M by N table with M rows and N columns containing M x N elements.

It has two indexs or we can say that axes – a row index (axis = 0) and a column index (axis = 1) . It is like a spreadsheet where each value is identifiable with the combination of row index and colum index. The row index is known as index in general and the column index is called the column-name.

It can easily change its values ie It is value Mutuable . You can also add or delete row/columns in a DataFrame.In other word it is size Mutable.

A DataFrame consists of three main components: the data, the row index, and the column index. The data is a collection of one or more columns, each of which can be of a different data type such as integers, floating-point numbers, or strings. The row index is a sequence of labels that identify each row, while the column index is a sequence of labels that identify each column.

DataFrames provide a wide range of functionality for data manipulation, including filtering, grouping, aggregating, pivoting, merging, and sorting. They also support many common data analysis tasks, such as calculating summary statistics, performing statistical tests, and visualizing data.

Syntax :-

import pandas as pd
pd1 = pd.DataFrame([])
print(pd1)

Example :-

import pandas as pd
dict1 = {“Rollno”:[1,2,3,4,5,6],
“Name”:[“Hardik”,”Prakash”,”Manan”,”Mohit”,”Nishant”,”Yash”],
“Marks”:[85,78,96,54,48,66],
“Grade”:[“A”,”B”,”A”,”C”,”C”,”B”]}
pd1 = pd.DataFrame(dict1)
print(pd1)

Python Pandas - I

Creating and Displaying a DataFrame

A two dimensional dictionary having items as (key:value) where value part is a data Strucuter of any type .

  1. Creation of an empty DataFrame.
  2. Creation of DataFrame from NumPy ndarrays
  3. Creation of DataFrame from List of Dictionaries
  4. Creation of DataFrame from Dictionary of Lists
  5. Creation of DataFrame from Series
  6. Creation of DataFrame from Dictionary of Series

 

Creating of an empty DataFrame.

An empty DataFrame can be created as follows

Example :-

import pandas as pd
dFrameEmt = pd.DataFrame()
dFrameEmt

 

 

 

Python Pandas - I

B. Creation of DataFrame from NumPy ndarrays

We can create a DataFrame using more than one ndarrays

Example :- 

import numpy as np
array1 = np.array([10,20,30])
array2 = np.array([100,200,300])
array3 = np.array([-10,-20,-30, -40])
dFrame4 = pd.DataFrame(array1)
dFrame4

 

Python Pandas - I

C. Creation of DataFrame from List of Dictionaries

We can create DataFrame from a list of Dictionaries
Create list of dictionaries
listDict = [{‘a’:10, ‘b’:20}, {‘a’:5,’b’:10, ‘c’:20}]
dFrameListDict = pd.DataFrame(listDict)
dFrameListDict

the dictionary keys are taken as column labels, and the values corresponding to each key are taken as rows. There will be as many rows as the number of dictionaries present in the list.

Python Pandas - I

D. Creation of DataFrame from Dictionary of Lists

DataFrames can also be created from a dictionary of the following dictionary consisting of the keys ‘State’, ‘GArea’ (geographical area) and ‘VDF’ (very dense forest) and the correspondinglists.

dictForest = {‘State’: [‘Assam’, ‘Delhi’,’Kerala’],
‘GArea’: [78438, 1483, 38852] ,
‘VDF’ : [2797, 6.72,1663]}
dFrameForest= pd.DataFrame(dictForest)
dFrameForest
Or you can add column to the DataFrame.
dFrameForest1 = pd.DataFrame(dictForest,
columns = [‘State’,’VDF’, ‘GArea’])
dFrameForest1

Python Pandas - I

E. Creation of DataFrame from Series

DataFrames can also be created from a dictionary of the following dictionary consisting of the keys ‘State’, ‘GArea’ (geographical area) and ‘VDF’ (very dense forest) and the corresponding lists.

dictForest = {‘State’: [‘Assam’, ‘Delhi’,’Kerala’],
‘GArea’: [78438, 1483, 38852] ,
‘VDF’ : [2797, 6.72,1663]}
dFrameForest= pd.DataFrame(dictForest)
dFrameForest
Or you can add column to the DataFrame.
dFrameForest1 = pd.DataFrame(dictForest,
columns = [‘State’,’VDF’, ‘GArea’])
dFrameForest1

Python Pandas - I

Data Frame Attribte

 

index 

Columns 

axes

dtypes

sizes

shape 

values 

empty 

ndim 

T

The index of the Data Frame.

The column labels of the DataFrame

Return a list representing both the axis (axis 0 ex index , axis 1 ex columns)

Return the dtype of data in the Data Frame.

Return an int representing the number of element in this object 

Return a tuple representation the dimensionality of the DataFrame 

Return a Numpy representation of the DataFrame 

Indicator whether DataFrame is Empty 

Return an int representating the number of axes/array dimension.

Transpose index and columns.

Python Pandas - I

Example of Data Frame Attribute 

dictForest = {‘State’: [‘Assam’, ‘Delhi’,’Kerala’],
‘GArea’: [78438, 1483, 38852] ,
‘VDF’ : [2797, 6.72,1663]}
df1= pd.DataFrame(dictForest)
df1

a) df1.index 

-> Index([1,2,3],dtype=”Int”)

b) df1.columns

-> index([‘State’,’GArea’,’VDF’],dtype=”object”)

c) df1.axis

-> [Index([1,2,3],dtype=”Int”),

Index([‘State’,’GArea’,’VDF’],dtype=”object”)]

d) df1.dtype

-> State :- object 

->GArea :- int64

-> VDF :- float64

 

Python Pandas - I

Operations on rows and columns in DataFrames

We can perform some basic operations on rows and columns of a DataFrame like selection, deletion, addition, and renaming.

                         Mohit        Nishant    Yash      

Maths             78                  78               85

Science          88                 92               96

 Hindi               90                  85              78 

(a) Adding a New Column to a DataFrame

-> We can easily add a new column to a DataFrame. In
order to add a new column for another student ‘Preeti’

example :-  pd1[‘Hardik’]=[89,90,85]


Python Pandas - I

E. Creation of DataFrame from Series

DataFrames can also be created from a dictionary of the following dictionary consisting of the keys ‘State’, ‘GArea’ (geographical area) and ‘VDF’ (very dense forest) and the corresponding lists.

dictForest = {‘State’: [‘Assam’, ‘Delhi’,’Kerala’],
‘GArea’: [78438, 1483, 38852] ,
‘VDF’ : [2797, 6.72,1663]}
dFrameForest= pd.DataFrame(dictForest)
dFrameForest
Or you can add column to the DataFrame.
dFrameForest1 = pd.DataFrame(dictForest,
columns = [‘State’,’VDF’, ‘GArea’])
dFrameForest1

Fun & Easy to follow
Works on all devices
Your own Pace
Super Affordable