PyPika - Python Query Builder

Today, I am going to talk about another SQL topic - PyPika. Okay, What is PyPika?

Basically, PyPika is a SQL query builder, which written in Python and doesn't limit the expressiveness of SQL. The advantages of PyPika are fast, expressive and flexible way to replace handwritten SQL.

To help you understand PyPika deeply, you can read the docs: http://pypika.readthedocs.io/en/latest/

But today, I am only going to show you some main classes in PyPika are "pypika.Query", "pypika.Table", and "pypika.Field".

Again, we are going to use "sakila" database as an example.

# Import some statements

import sqlite3

import pandas as pd

from pypika import Query, Table, Field, Order
# Connecting SQLite to the Database

connection = sqlite3.connect('sakila.db')

connection.commit()
# Displaying the output as a DataFrame, I hope you still remember this :)

def sql_to_df(sql_query):

    # Use pandas to pass sql query
    df = pd.read_sql(sql_query, con = connection)

    return df

This is how we used to pass SQL query to pandas and display the output as a DataFrame

query = ''' SELECT first_name, last_name, email
            FROM customer; '''

sql_to_df(query).head()
first_name last_name email
0 MARY SMITH MARY.SMITH@sakilacustomer.org
1 PATRICIA JOHNSON PATRICIA.JOHNSON@sakilacustomer.org
2 LINDA WILLIAMS LINDA.WILLIAMS@sakilacustomer.org
3 BARBARA JONES BARBARA.JONES@sakilacustomer.org
4 ELIZABETH BROWN ELIZABETH.BROWN@sakilacustomer.org

Let's see how PyPika can do

Selecting Data

Using pypika.Query

q = Query.from_('customer').select('customer_id', 'first_name', 'last_name', 'email')

# To convert the query into raw SQL, it can be cast to a string

query = str(q)

query
'SELECT "customer_id","first_name","last_name","email" FROM "customer"'
# Or you can use the Query.get_sql() function

query_alt = q.get_sql()

query_alt
'SELECT "customer_id","first_name","last_name","email" FROM "customer"'

We got the same result as the way we used to do with pandas

sql_to_df(query).head()
customer_id first_name last_name email
0 1 MARY SMITH MARY.SMITH@sakilacustomer.org
1 2 PATRICIA JOHNSON PATRICIA.JOHNSON@sakilacustomer.org
2 3 LINDA WILLIAMS LINDA.WILLIAMS@sakilacustomer.org
3 4 BARBARA JONES BARBARA.JONES@sakilacustomer.org
4 5 ELIZABETH BROWN ELIZABETH.BROWN@sakilacustomer.org

Using pypika.Table

customer = Table('customer')

q = Query.from_(customer).select(customer.customer_id, customer.first_name, customer.last_name, customer.email)

query_table = str(q)

query_table
'SELECT "customer_id","first_name","last_name","email" FROM "customer"'
sql_to_df(query_table).head()
customer_id first_name last_name email
0 1 MARY SMITH MARY.SMITH@sakilacustomer.org
1 2 PATRICIA JOHNSON PATRICIA.JOHNSON@sakilacustomer.org
2 3 LINDA WILLIAMS LINDA.WILLIAMS@sakilacustomer.org
3 4 BARBARA JONES BARBARA.JONES@sakilacustomer.org
4 5 ELIZABETH BROWN ELIZABETH.BROWN@sakilacustomer.org

Both of the above examples result in the following SQL:

SELECT customer_id, first_name, last_name, email FROM customer

Results can be ordered by using the following syntax:

q = Query.from_('customer').select('customer_id', 'first_name',
                                   'last_name', 'email').orderby('customer_id', order=Order.desc)

query_order = str(q)

This results in the following SQL:

SELECT "customer_id", "first_name", "last_name", "phone" FROM "customer" ORDER BY "customer_id" DESC

sql_to_df(query_order).head()
customer_id first_name last_name email
0 599 AUSTIN CINTRON AUSTIN.CINTRON@sakilacustomer.org
1 598 WADE DELVALLE WADE.DELVALLE@sakilacustomer.org
2 597 FREDDIE DUGGAN FREDDIE.DUGGAN@sakilacustomer.org
3 596 ENRIQUE FORSYTHE ENRIQUE.FORSYTHE@sakilacustomer.org
4 595 TERRENCE GUNDERSON TERRENCE.GUNDERSON@sakilacustomer.org

Arithmetic

Using pypika.Field

q = Query.from_('payment').select(Field('amount') * Field('rental_id'))

query_field = str(q)

query_field
'SELECT "amount"*"rental_id" FROM "payment"'
sql_to_df(query_field).head()
"amount"*"rental_id"
0 227.24
1 567.27
2 7098.15
3 1407.78
4 14745.24

Using pypika.Table

payment = Table('payment')

q = Query.from_(payment).select(payment.amount * payment.rental_id)

query_table_field = str(q)

query_table_field
'SELECT "amount"*"rental_id" FROM "payment"'
sql_to_df(query_table_field).head()
"amount"*"rental_id"
0 227.24
1 567.27
2 7098.15
3 1407.78
4 14745.24

Both of the above examples result in the following SQL:

SELECT amount*rental_id FROM payment

An alias can also be used for fields and expressions

q = Query.from_(payment).select((payment.amount * payment.rental_id).as_('profit'))

query_field_alias = str(q)

query_field_alias
'SELECT "amount"*"rental_id" "profit" FROM "payment"'
sql_to_df(query_field_alias).head()
profit
0 227.24
1 567.27
2 7098.15
3 1407.78
4 14745.24

Now, you know some basic classes in PyPika. I highly recommend you try to use PyPika because it excels at all sorts of SQL queries but is expecially useful for data analysis.

png