When using Pandas to analyze data, besides reading text-based data, such as Excel and CSV files, database reading is also involved. The usual solution is Pandas+ SQLAlchemy.

The following is a record of some operations, as a memo.

Connecting to a database using sqlalchemy

Engine translates to engine, the car is driven by the engine, and SQLAlchemy is driven by the Engine, which maintains a connection pool (Pool) object and a dialect (Dialect). Dialect simply means whether you are connecting to MySQL or Oracle or PostgreSQL or some other database.

Connection example.

1
2
3
4
from sqlalchemy import create_engine

engine = create_engine('sqlite:///test.db')
# engine = create_engine("mysql+pymysql://root:root@localhost/hackathon")

The databases supported by SQLAlchemy include: PostgreSQL, MySQL, MariaDB, Oracle, Microsoft SQL Server, SQLite, etc. The specific connection methods are not detailed here. Some of the most error-prone database passwords have special characters, which need to be coded and processed before submission on the day.

Solution.

1
2
import urllib.parse
urllib.parse.quote_plus("kx%jj5/g")

Save the DataFrame to the database

DataFrame comes with to_sql method, which is used as follows.

1
DataFrame.to_sql(self, name: str, con, schema=None, if_exists: str = 'fail', index: bool = True, index_label=None, chunksize=None, dtype=None, method=None)

Parameter Description.

  • name: database table name
  • con: database connection
  • schema: schema of the database module, not needed for most connections
  • if_exists: optional: {‘fail’, ‘replace’, ‘append’}
  • index: if or not the index column of the DataFrame will be written
  • index_label: the name of the index column
  • chunksize: set the number of entries to be written in bulk
  • dtype: set the type of the field
  • method: insert method, can not be set

Usage examples.

1
df.to_sql(con=engine, name='pair_result', if_exists='append')

Traditional way: fetching database data by cursor

Code example.

1
2
3
4
with engine.connect() as con:
    rs = con.execute('SELECT * FROM testdb.abc')
    for row in rs:
        print(row)

Transforming data from the database into a DataFrame.

1
2
3
4
with engine.connect() as con:
    rs = con.execute('SELECT * FROM testdb.abc ')
    df = pd.DataFrame(rs.fetchall())
    df.columns = rs.keys()

Easy way: read directly using pd.read_sql()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite:///test.db')

# 查询表数据
df = pd.read_sql("sub_place", engine)
print(df.head())

# 使用SQL查询数据
query = '''SELECT * 
            FROM sub_place 
            WHERE father_area_name='中国' 
            '''
china_df = pd.read_sql(query, engine)
print(china_df.head())