David AI Blog

‍

Pandas is a popular Python data analysis tool.

It provides easy to use and highly efficient datastructures.

These data structures deal with numeric or labeled data, stored in the form of tables.

Data Structures in Pandas

Two fundamental data structures used in pandas are,

Series: A 1-D array.

Data Frame: A 2-D array or two or more Series joined together

Series is a 1-D array, holding data values of a single variable, captured from multiple observations.

Few examples are:

Height of each student, belonging to a Class 'C'.

Amount of daily rainfall received at Station 'X', in July2017.

Total sales of a product 'P' in every quarter of 2016.

A Data Frame is 2-D shaped and contains data of differentparameters, captured from multiple observations.

Each observation is represented by a single row, and eachparameter by a single column.Each column can hold different data type.

Few examples are:

Height and Weight of all students, belonging to a Class'C'.Daily Rainfall received and Average Temperature of a location 'X', in theyear 2017.

https://www.youtube.com/watch?v=CLoNO-XxNXU

Flexibility in python

Pandas good working with large data sets ,we could do lot of manipulation of the data

We need to save in CSV we could load the data in the DataFrame that is object type which is used in Pandas

import pandas as pd

they don’t have to find the pandas again and again so we usepd as the alias for pandas

We need to installpandas

pip install pandas

Data Access refers to extracting data present in defined data structures.

Pandas provide utilities like loc and iloc to get data from a Series, a DataFrame, or a Panel.

Accessing a Single Value

Individual elements can be accessed by specifying either index number or index value, inside the square brackets.

import pandas as pd

import numpy as np

z = np.arange(10, 16)

s = pd.Series(z, index=list('abcdef'))

#Accessing 3rd element of s.

s[2] # ---> Returns '12'

#Accessing 4th element of s.

s['d'] # ---> Returns '13'

Accessing a Single Value

It is also possible to access a single element by passing index number or index value, as an argument to get method.

s.get(2) # ---> Returns '12'

s.get('d') # ---> Returns '13'

=========================================================================

Accessing a Slice

A Series can be sliced in a way, very similar to slicing a python list.

Expression 1

s[1:4]

Output

b 11

c 12

d 13

dtype: int32

Expression 2

s['b':'e']

Output

b 11

c 12

d 13

e 14

dtype: int32

Elements corresponding to startand end index values are included, when index values are used for slicing.

Accessing Data from a Data Frame

Pandas allows .loc, .iloc methods for selecting rows.

Using square brackets ([]) is also allowed, especially forselecting columns.

More details can be gathered from the shown video.

How to Access Data using DataFrames with Pandas

https://www.youtube.com/watch?v=qYc58lb--Q4

Knowing a Series

It is possible to understand a Series better by usingdescribe method.The method provides details like mean, std, etc. about aseries.

Example

importpandas as pd

importnumpy as np

temp =pd.Series(28 + 10*np.random.randn(10))

print(temp.describe())

Output

count 10.000000

mean 30.335711

std 8.402697

min 10.874673

25% 27.431943

50% 31.286962

75% 35.148773

max 40.770861

dtype: float64

Knowing a DataFrame

Two methods majorly info and describecan be used to know about the data, present in a data frame.

importpandas as pd

importnumpy as np

· We need to populate the data *

df = pd.DataFrame({'temp':pd.Series(28 +10*np.random.randn(10)),

'rain':pd.Series(100 + 50*np.random.randn(10)),

'location':list('AAAAABBBBB')})

print(df.info())

Output

RangeIndex: 10 entries, 0 to 9

Data columns (total 3 columns):

location 10non-null object

rain 10non-null float64

temp 10non-null float64

dtypes: float64(2), object(1)

memory usage: 320.0+ bytes

Knowing a Data Frame

describemethod by default provides details of only numeric fields.

Example

print(df.describe())

Output

rain temp

count 10.000000 10.000000

mean 108.860520 28.631922

std 55.584867 4.866241

min 19.512179 21.327725

25% 56.911505 25.658738

50% 128.776209 29.564648

75% 156.972247 31.496084

max 164.159265 36.086240

the output program explaining a Data Frame

Creating 3 columns

Fetching some rows

Output for fetching the row

Fetch a column

I/O with Pandas

Pandas provides support for reading/writing data from/to some sources.

For example : read_csv is used to read datafrom a CSV file and to_csv is utilized to write data to a CSV file.

https://pandas.pydata.org/docs/

https://www.youtube.com/c/Zenva/playlists

EACH FILE IN PANDAS couldbe the same and handled the same

I/O Methods

The following video shows the details of I/O methodsused in pandas.

How to fetch the Data from the websites using Pandas :

Reading Data from URL

Read HTML Page using Pandas | read_html() |Web-scrapping Tutorial

We are importingthe methods requests and importing a method seaborn as sns

import requests

importseaborn as sns

importpandas as pd

we could see the output as the data removed from the data set

it pulled the entire table

This method onlyworks on tables if they are no tables it wont work

Taking a new URL from the different website

df1 = pd.read_html(url)

df1(0)

We had pandas as very powerful tool /where we could fetchthe data in different ways

Reading Data from Databases

Pandas also supports reading data from Database tables.

The following video illustrates reading data from a table ofMYSQL database.

four things we learn mysequel

1.Reading data from the Database

2. How to define the custom index using the Data col

2. Reading data from chunks

3. Parametrized Query

https://www.youtube.com/watch?v=yab4oWYypPA

The code is done in visual studio

1 step is importing dependencies and libraries

Creating a connection from the database sql

We will pass the query where the data could be fetched inpython

df = pd.read_sql(“SELECT * FROMCOUNTRY ORDER BY CONTINENT<CODE”, conn)

Displaying a DataFrame by

We are adding the coloumn ascontinent and code

Output displaying continent and code

Now we could see data in chunks

The output which we receive

Output

The country name given is India

How to handle large datasets

https://towardsdatascience.com/why-and-how-to-use-pandas-with-large-data-9594dda2ea4c

Reading Data from Json

pandas provides the utilities read_json and to_json to dealwith JSON strings or files.

Consider the below string EmployeeRecords for understandingconversion of a JSON string into a data frame.

Example

EmployeeRecords = [{'EmployeeID':451621,'EmployeeName':'Preeti Jain', 'DOJ':'30-Aug-2008'},

{'EmployeeID':123621, 'EmployeeName':'Ashok Kumar','DOJ':'25-Sep-2016'},

{'EmployeeID':451589, 'EmployeeName':'Johnty Rhodes','DOJ':'04-Nov-2016'}]

Reading Data from JSON

Example

import json

emp_records_json_str = json.dumps(EmployeeRecords)

df = pd.read_json(emp_records_json_str, orient='records',convert_dates=['DOJ'])

print(df)

Output

DOJ EmployeeID EmployeeName

0 2008-08-30 451621 Preeti Jain

1 2016-09-25 123621 Ashok Kumar

2 2016-11-04 451589 Johnty Rhodes

orient argument defines how data is organised in JSONstring.

Indexing

Indexing refers to labeling data elements of a Series, aData Frame.

These labels can be utilized for selecting portion of datafrom any of the defined data structures.

Indexing a Data Frame

A single level index can be set to a data frame, by passinga list of values to either using index attribute or index argument of DataFramefunction.

Example

import pandas as pd

import numpy as np

df = pd.DataFrame(np.random.rand(5,2))

df.index = [ 'row_' + str(i) for i in range(1, 6) ]

Output

0 1

row_1 0.919754 0.063280

row_2 0.803853 0.758804

row_3 0.871375 0.428759

row_4 0.128372 0.416698

row_5 0.991222 0.546599

DateTime Indexes

Pandas support generating a range of dates, with methodslike date_range, bdate_range.

https://www.youtube.com/watch?v=yCgJGsg0Xa4

Hierarchical Indexing

In addition to single level indexing, pandas supportsmultilevel or hierarchical indexing.

The below illustrates creating two levels of index for aData Frame.

https://www.youtube.com/watch?v=nE21ZlXiByY

‍

Pandas

What’s a Rich Text element?

Static and dynamic content editing

How to customize formatting for each rich text

similar Posts

Maven

CISCO

Python

AI in Security

AI in Medicine

AI in Cameras

AI Doctors

Learning Coding

Azure Cloud Services

Pandas