PyPika - Python Query Builder
Today, I am going to talk about another SQL topic - PyPika. Okay, What is PyPika?
Basically, PyPika is a SQL query builder, which written in Python and doesn't limit the expressiveness of SQL. The advantages of PyPika are fast, expressive and flexible way to replace handwritten SQL.
To help you understand PyPika deeply, you can read the docs: http://pypika.readthedocs.io/en/latest/
But today, I am only going to show you some main classes in PyPika are "pypika.Query", "pypika.Table", and "pypika.Field".
Again, we are going to use "sakila" database as an example.
# Import some statements
import sqlite3
import pandas as pd
from pypika import Query, Table, Field, Order
# Connecting SQLite to the Database
connection = sqlite3.connect('sakila.db')
connection.commit()
# Displaying the output as a DataFrame, I hope you still remember this :)
def sql_to_df(sql_query):
# Use pandas to pass sql query
df = pd.read_sql(sql_query, con = connection)
return df
This is how we used to pass SQL query to pandas and display the output as a DataFrame
query = ''' SELECT first_name, last_name, email
FROM customer; '''
sql_to_df(query).head()
first_name | last_name | ||
---|---|---|---|
0 | MARY | SMITH | MARY.SMITH@sakilacustomer.org |
1 | PATRICIA | JOHNSON | PATRICIA.JOHNSON@sakilacustomer.org |
2 | LINDA | WILLIAMS | LINDA.WILLIAMS@sakilacustomer.org |
3 | BARBARA | JONES | BARBARA.JONES@sakilacustomer.org |
4 | ELIZABETH | BROWN | ELIZABETH.BROWN@sakilacustomer.org |
Let's see how PyPika can do
Selecting Data
Using pypika.Query
q = Query.from_('customer').select('customer_id', 'first_name', 'last_name', 'email')
# To convert the query into raw SQL, it can be cast to a string
query = str(q)
query
'SELECT "customer_id","first_name","last_name","email" FROM "customer"'
# Or you can use the Query.get_sql() function
query_alt = q.get_sql()
query_alt
'SELECT "customer_id","first_name","last_name","email" FROM "customer"'
We got the same result as the way we used to do with pandas
sql_to_df(query).head()
customer_id | first_name | last_name | ||
---|---|---|---|---|
0 | 1 | MARY | SMITH | MARY.SMITH@sakilacustomer.org |
1 | 2 | PATRICIA | JOHNSON | PATRICIA.JOHNSON@sakilacustomer.org |
2 | 3 | LINDA | WILLIAMS | LINDA.WILLIAMS@sakilacustomer.org |
3 | 4 | BARBARA | JONES | BARBARA.JONES@sakilacustomer.org |
4 | 5 | ELIZABETH | BROWN | ELIZABETH.BROWN@sakilacustomer.org |
Using pypika.Table
customer = Table('customer')
q = Query.from_(customer).select(customer.customer_id, customer.first_name, customer.last_name, customer.email)
query_table = str(q)
query_table
'SELECT "customer_id","first_name","last_name","email" FROM "customer"'
sql_to_df(query_table).head()
customer_id | first_name | last_name | ||
---|---|---|---|---|
0 | 1 | MARY | SMITH | MARY.SMITH@sakilacustomer.org |
1 | 2 | PATRICIA | JOHNSON | PATRICIA.JOHNSON@sakilacustomer.org |
2 | 3 | LINDA | WILLIAMS | LINDA.WILLIAMS@sakilacustomer.org |
3 | 4 | BARBARA | JONES | BARBARA.JONES@sakilacustomer.org |
4 | 5 | ELIZABETH | BROWN | ELIZABETH.BROWN@sakilacustomer.org |
Both of the above examples result in the following SQL:
SELECT customer_id, first_name, last_name, email FROM customer
Results can be ordered by using the following syntax:
q = Query.from_('customer').select('customer_id', 'first_name',
'last_name', 'email').orderby('customer_id', order=Order.desc)
query_order = str(q)
This results in the following SQL:
SELECT "customer_id", "first_name", "last_name", "phone" FROM "customer" ORDER BY "customer_id" DESC
sql_to_df(query_order).head()
customer_id | first_name | last_name | ||
---|---|---|---|---|
0 | 599 | AUSTIN | CINTRON | AUSTIN.CINTRON@sakilacustomer.org |
1 | 598 | WADE | DELVALLE | WADE.DELVALLE@sakilacustomer.org |
2 | 597 | FREDDIE | DUGGAN | FREDDIE.DUGGAN@sakilacustomer.org |
3 | 596 | ENRIQUE | FORSYTHE | ENRIQUE.FORSYTHE@sakilacustomer.org |
4 | 595 | TERRENCE | GUNDERSON | TERRENCE.GUNDERSON@sakilacustomer.org |
Arithmetic
Using pypika.Field
q = Query.from_('payment').select(Field('amount') * Field('rental_id'))
query_field = str(q)
query_field
'SELECT "amount"*"rental_id" FROM "payment"'
sql_to_df(query_field).head()
"amount"*"rental_id" | |
---|---|
0 | 227.24 |
1 | 567.27 |
2 | 7098.15 |
3 | 1407.78 |
4 | 14745.24 |
Using pypika.Table
payment = Table('payment')
q = Query.from_(payment).select(payment.amount * payment.rental_id)
query_table_field = str(q)
query_table_field
'SELECT "amount"*"rental_id" FROM "payment"'
sql_to_df(query_table_field).head()
"amount"*"rental_id" | |
---|---|
0 | 227.24 |
1 | 567.27 |
2 | 7098.15 |
3 | 1407.78 |
4 | 14745.24 |
Both of the above examples result in the following SQL:
SELECT amount*rental_id FROM payment
An alias can also be used for fields and expressions
q = Query.from_(payment).select((payment.amount * payment.rental_id).as_('profit'))
query_field_alias = str(q)
query_field_alias
'SELECT "amount"*"rental_id" "profit" FROM "payment"'
sql_to_df(query_field_alias).head()
profit | |
---|---|
0 | 227.24 |
1 | 567.27 |
2 | 7098.15 |
3 | 1407.78 |
4 | 14745.24 |