Pandas+SQLAlchemy interaction with database

When using Pandas to analyze data, besides reading text-based data, such as Excel and CSV files, database reading is also involved. The usual solution is Pandas+ SQLAlchemy.

The following is a record of some operations, as a memo.

Connecting to a database using sqlalchemy

Engine translates to engine, the car is driven by the engine, and SQLAlchemy is driven by the Engine, which maintains a connection pool (Pool) object and a dialect (Dialect). Dialect simply means whether you are connecting to MySQL or Oracle or PostgreSQL or some other database.

Connection example.

from sqlalchemy import create_engine

engine = create_engine('sqlite:///test.db')
# engine = create_engine("mysql+pymysql://root:root@localhost/hackathon")

The databases supported by SQLAlchemy include: PostgreSQL, MySQL, MariaDB, Oracle, Microsoft SQL Server, SQLite, etc. The specific connection methods are not detailed here. Some of the most error-prone database passwords have special characters, which need to be coded and processed before submission on the day.

Solution.

1
2

import urllib.parse
urllib.parse.quote_plus("kx%jj5/g")

Save the DataFrame to the database

DataFrame comes with to_sql method, which is used as follows.

`1`	`DataFrame.to_sql(self, name: str, con, schema=None, if_exists: str = 'fail', index: bool = True, index_label=None, chunksize=None, dtype=None, method=None)`

Parameter Description.

name: database table name
con: database connection
schema: schema of the database module, not needed for most connections
if_exists: optional: {‘fail’, ‘replace’, ‘append’}
index: if or not the index column of the DataFrame will be written
index_label: the name of the index column
chunksize: set the number of entries to be written in bulk
dtype: set the type of the field
method: insert method, can not be set

Usage examples.

`1`	`df.to_sql(con=engine, name='pair_result', if_exists='append')`

Traditional way: fetching database data by cursor

Code example.

with engine.connect() as con:
    rs = con.execute('SELECT * FROM testdb.abc')
    for row in rs:
        print(row)

Transforming data from the database into a DataFrame.

with engine.connect() as con:
    rs = con.execute('SELECT * FROM testdb.abc ')
    df = pd.DataFrame(rs.fetchall())
    df.columns = rs.keys()

Easy way: read directly using pd.read_sql()

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite:///test.db')

# 查询表数据
df = pd.read_sql("sub_place", engine)
print(df.head())

# 使用SQL查询数据
query = '''SELECT * 
            FROM sub_place 
            WHERE father_area_name='中国' 
            '''
china_df = pd.read_sql(query, engine)
print(china_df.head())

Table of Contents

Connecting to a database using sqlalchemy

Save the DataFrame to the database

Traditional way: fetching database data by cursor

Easy way: read directly using pd.read_sql()