back

Pandas

Pandas
9.2.21
back

Pandas is a popular Python data analysis tool.

It provides easy to use and highly efficient datastructures.

 

These data structures deal with numeric or labeled data, stored in the form of tables.

 

 

Data Structures in Pandas

Two fundamental data structures used in pandas are,

Series: A 1-D array.

Data Frame: A 2-D array or two or more Series joined together

 

 

Series is a 1-D array, holding data values of a single variable, captured from multiple observations.

Few examples are:

Height of each student, belonging to a Class 'C'.

Amount of daily rainfall received at Station 'X', in July2017.

Total sales of a product 'P' in every quarter of 2016.

 

 

A Data Frame is 2-D shaped and contains data of differentparameters, captured from multiple observations.

Each observation is represented by a single row, and eachparameter by a single column.Each column can hold different data type.

 

Few examples are:

Height and Weight of all students, belonging to a Class'C'.Daily Rainfall received and Average Temperature of a location 'X', in theyear 2017.

 

 

 

https://www.youtube.com/watch?v=CLoNO-XxNXU

 

Flexibility in python

Pandas good working with large data sets ,we could do lot of manipulation of the data

We need to save in CSV we could load the data in the DataFrame that is object type which is used in Pandas

import pandas as pd

they don’t have to find the pandas again and again so we usepd as the alias for pandas

 

We need  to installpandas

 

pip install pandas

 

 

Data Access refers to extracting data present in defined data structures.

Pandas provide utilities like loc and iloc to get data from a Series, a DataFrame, or a Panel.

 

 

Accessing a Single Value

Individual elements can be accessed by specifying either index number or index value, inside the square brackets.

import pandas as pd

import numpy as np

z = np.arange(10, 16)

s = pd.Series(z, index=list('abcdef'))

#Accessing 3rd element of s.

s[2] # ---> Returns '12'

#Accessing 4th element of s.

s['d'] # ---> Returns '13'

Accessing a Single Value

It is also possible to access a single element by passing index number or index value, as an argument to get method.

s.get(2) # ---> Returns '12'

s.get('d') # ---> Returns '13'

=========================================================================

 

Accessing a Slice

A Series can be sliced in a way, very similar to slicing a python list.

Expression 1

s[1:4]

Output

b    11

c    12

d    13

dtype: int32

Expression 2

s['b':'e']

Output

b    11

c    12

d    13

e    14

dtype: int32

Elements corresponding to startand end index values are included, when index values are used for slicing.

 

 

 

 

Accessing Data from a Data Frame

Pandas allows .loc, .iloc methods for selecting rows.

Using square brackets ([]) is also allowed, especially forselecting columns.

More details can be gathered from the shown video.

 

 

How to Access Data using DataFrames with Pandas

https://www.youtube.com/watch?v=qYc58lb--Q4

 

 

 

Knowing a Series

It is possible to understand a Series better by usingdescribe method.The method provides details like mean, std, etc. about aseries.

Example

importpandas as pd

importnumpy as np

temp =pd.Series(28 + 10*np.random.randn(10))

print(temp.describe())

Output

count    10.000000

mean     30.335711

std       8.402697

min      10.874673

25%      27.431943

50%      31.286962

75%      35.148773

max      40.770861

 

dtype: float64

 

 

 

 

Knowing a DataFrame

Two methods majorly info and describecan be used to know about the data, present in a data frame.

importpandas as pd

importnumpy as np

·      We need to populate the data *

df = pd.DataFrame({'temp':pd.Series(28 +10*np.random.randn(10)),

              'rain':pd.Series(100 + 50*np.random.randn(10)),

           'location':list('AAAAABBBBB')})

print(df.info())

Output

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 10 entries, 0 to 9

Data columns (total 3 columns):

location    10non-null object

rain        10non-null float64

temp        10non-null float64

dtypes: float64(2), object(1)

memory usage: 320.0+ bytes

 

 

 

 

 

 

Knowing a Data Frame

describemethod by default provides details of only numeric fields.

Example

 

print(df.describe())

 

 

Output

            rain       temp

count   10.000000  10.000000

mean   108.860520  28.631922

std     55.584867   4.866241

min     19.512179  21.327725

25%     56.911505  25.658738

50%    128.776209  29.564648

75%    156.972247  31.496084

max    164.159265  36.086240

 

the output program explaining a Data Frame

 

Creating 3 columns

 

 

Fetching some rows

Output for fetching the row

 

 

 

 

Fetch a column

 

 

 

I/O with Pandas

Pandas provides support for reading/writing data from/to some sources.


For example :   read_csv is used to read datafrom a CSV file and to_csv is utilized to write data to a CSV file.

 

https://pandas.pydata.org/docs/

https://www.youtube.com/c/Zenva/playlists

 

 

EACH FILE IN PANDAS couldbe the same and handled the same

 

 

 

 

 

 

 

 

I/O Methods

The following video shows the details of I/O methodsused in pandas.

 

 

 

How to fetch the Data from the websites using Pandas :

Reading Data from URL

Read HTML Page using Pandas | read_html() |Web-scrapping Tutorial

 

We are importingthe methods requests and importing a method seaborn as sns

 

import requests

importseaborn as sns

importpandas as pd

 

we could see the output as the data  removed from the data set

 

 

 

it pulled the entire table

 

 

This  method onlyworks on tables if they are no tables it wont work

 

Taking a new URL from the different website

 

 

 

 

 

 

 

 

 

df1 = pd.read_html(url)

 df1(0)

 

 

 

 

 

We had pandas as very powerful tool /where we could fetchthe data in different ways

 

Reading Data from Databases

Pandas also supports reading data from Database tables.

The following video illustrates reading data from a table ofMYSQL database.

 

four things we learn mysequel

 

1.Reading data from the Database

2. How to define the custom index using the Data col

2. Reading data from chunks

3. Parametrized Query

 

 

https://www.youtube.com/watch?v=yab4oWYypPA

 

 

 

The code is done in visual studio  

1 step is importing dependencies and libraries

Creating a connection from the database sql

 

We will pass the query where the data could be fetched inpython

 df = pd.read_sql(“SELECT * FROMCOUNTRY ORDER BY CONTINENT<CODE”, conn)

 

 

Displaying a DataFrame by

df

 

 

We are adding the coloumn ascontinent and code

 

 

 

Output displaying continent and code

 

Now we could see data in chunks

 

 

 

 

The output which we receive

 

 

 

Output

The country name given is India

 

 

 

How to handle large datasets

https://towardsdatascience.com/why-and-how-to-use-pandas-with-large-data-9594dda2ea4c

 

 

Reading Data from Json

pandas provides the utilities read_json and to_json to dealwith JSON strings or files.

Consider the below string EmployeeRecords for understandingconversion of a JSON string into a data frame.

Example

EmployeeRecords = [{'EmployeeID':451621,'EmployeeName':'Preeti Jain', 'DOJ':'30-Aug-2008'},

{'EmployeeID':123621, 'EmployeeName':'Ashok Kumar','DOJ':'25-Sep-2016'},

{'EmployeeID':451589, 'EmployeeName':'Johnty Rhodes','DOJ':'04-Nov-2016'}]

 

 

Reading Data from JSON

Example

import json

emp_records_json_str = json.dumps(EmployeeRecords)

df = pd.read_json(emp_records_json_str, orient='records',convert_dates=['DOJ'])

print(df)

Output

        DOJ  EmployeeID  EmployeeName

0 2008-08-30     451621    Preeti Jain

1 2016-09-25     123621    Ashok Kumar

2 2016-11-04     451589  Johnty Rhodes

orient argument defines how data is organised in JSONstring.

 

Indexing

Indexing

Indexing refers to labeling data elements of a Series, aData Frame.

These labels can be utilized for selecting portion of datafrom any of the defined data structures.

 

 

 

Indexing a Data Frame

A single level index can be set to a data frame, by passinga list of values to either using index attribute or index argument of DataFramefunction.

Example

import pandas as pd

import numpy as np

df = pd.DataFrame(np.random.rand(5,2))

df.index = [ 'row_' + str(i) for i in range(1, 6) ]

df

Output

      0         1

row_1  0.919754  0.063280

row_2  0.803853  0.758804

row_3  0.871375  0.428759

row_4  0.128372  0.416698

row_5  0.991222  0.546599

 

 

 

 

DateTime Indexes

Pandas support generating a range of dates, with methodslike date_range, bdate_range.

 

https://www.youtube.com/watch?v=yCgJGsg0Xa4

Hierarchical Indexing

In addition to single level indexing, pandas supportsmultilevel or hierarchical indexing.

 

The below illustrates creating two levels of index for aData Frame.

https://www.youtube.com/watch?v=nE21ZlXiByY

 

 

similar Posts