How to Store data in SQLite3

We have been playing data with Excel files for a while. Have you questioned if you have very large datasets, how can you deal with that?

In other words, we need a tool that is able to capably manage or interact with very large datasets. This is why we have SQL.

SQL stands for Structured Query Language. Also, SQL can create dataflows for cleaning and preparing data at high volumes. However, it is not a data visualization tool which means not for analyzing data.

SQL is used in conjunction with Python. Now, I show you some fun things about SQLite database through Python. In fact, Python provides two interfaces for working with SQLite database library including PySQLite and APSW.

The PySQLite provides a standardized Python DBI-API 2.0 specification compliant interface to the SQLite database using a module called "sqlite3".

Using Python's SQLite Module

In order to use the SQLite3 module, we need to have import statements

import sqlite3

import pandas as pd

Connecting SQLite to the Database. We are going to use "sakila" database as an example.

To use the module, we must create a connection object that represents the database. We use the function "sqlite3.connect" to connect to the database

connection = sqlite3.connect('sakila.db')

connection.commit()

Now we can run a basic SQL query, pass it with pandas, and display the output as a DataFrame

def sql_to_df(sql_query):

    # Use pandas to pass sql query
    df = pd.read_sql(sql_query, con = connection)

    return df

Ahhha! We just passed a SQL Query using pandas and Python. Now, let's play with some basic SQL

Selecting Multiple Columns

query = ''' SELECT first_name,last_name
            FROM customer; '''

sql_to_df(query).head(5)
first_name last_name
0 MARY SMITH
1 PATRICIA JOHNSON
2 LINDA WILLIAMS
3 BARBARA JONES
4 ELIZABETH BROWN

Selecting Everything from table with *

query1 = ''' SELECT *
            FROM customer '''

sql_to_df(query1).head(5)
customer_id store_id first_name last_name email address_id active create_date last_update
0 1 1 MARY SMITH MARY.SMITH@sakilacustomer.org 5 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
1 2 1 PATRICIA JOHNSON PATRICIA.JOHNSON@sakilacustomer.org 6 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
2 3 1 LINDA WILLIAMS LINDA.WILLIAMS@sakilacustomer.org 7 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
3 4 2 BARBARA JONES BARBARA.JONES@sakilacustomer.org 8 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28
4 5 1 ELIZABETH BROWN ELIZABETH.BROWN@sakilacustomer.org 9 1 2006-02-14 22:04:36.000 2011-09-14 18:10:28

In a dataframe, a column may contain duplicate values and you only want to the unique values. The DISTINCT keyword can be used to return only unique values.

SELECT DISTINCT column_name FROM table_name

query2 = ''' SELECT DISTINCT(country_id)
            FROM city'''

sql_to_df(query2).head(5)
country_id
0 1
1 2
2 3
3 4
4 5

We have seen how to store data into excel files, which is nothing to surprise us. Let' assume we want to store data in files which is widely used and light-weighted. This is when SQLite steps in. SQLite is well-known among analysts. Moreover, it is serverless which means we do not need any separate server for availing its services. In this blog, we are going to explore useful module called SQLite which will allow us to store data in databases.

Let's extract data from table "country" in "sakila" database, and store it in sqlite file

s

Once you have a Connection to the database, we can create a Cursor object and call its execute() method to perform SQL commands

In order to get all the values, we use "fetchall()", but if you want to get just one element, you could have also used "fetchone()"

cur = connection.cursor()

cur.execute('SELECT * from country')

print(cur.fetchone())
(1, 'Afghanistan', '2011-09-14 18:05:01')
print(cur.fetchall())
[(2, 'Algeria', '2011-09-14 18:05:01'), (3, 'American Samoa', '2011-09-14 18:05:01'), (4, 'Angola', '2011-09-14 18:05:01'), (5, 'Anguilla', '2011-09-14 18:05:01'), (6, 'Argentina', '2011-09-14 18:05:01'), (7, 'Armenia', '2011-09-14 18:05:01'), (8, 'Australia', '2011-09-14 18:05:01'), (9, 'Austria', '2011-09-14 18:05:01'), (10, 'Azerbaijan', '2011-09-14 18:05:01'), (11, 'Bahrain', '2011-09-14 18:05:01'), (12, 'Bangladesh', '2011-09-14 18:05:01'), (13, 'Belarus', '2011-09-14 18:05:01'), (14, 'Bolivia', '2011-09-14 18:05:01'), (15, 'Brazil', '2011-09-14 18:05:01'), (16, 'Brunei', '2011-09-14 18:05:01'), (17, 'Bulgaria', '2011-09-14 18:05:01'), (18, 'Cambodia', '2011-09-14 18:05:01'), (19, 'Cameroon', '2011-09-14 18:05:01'), (20, 'Canada', '2011-09-14 18:05:01'), (21, 'Chad', '2011-09-14 18:05:01'), (22, 'Chile', '2011-09-14 18:05:01'), (23, 'China', '2011-09-14 18:05:01'), (24, 'Colombia', '2011-09-14 18:05:01'), (25, 'Congo, The Democratic Republic of the', '2011-09-14 18:05:01'), (26, 'Czech Republic', '2011-09-14 18:05:01'), (27, 'Dominican Republic', '2011-09-14 18:05:01'), (28, 'Ecuador', '2011-09-14 18:05:01'), (29, 'Egypt', '2011-09-14 18:05:01'), (30, 'Estonia', '2011-09-14 18:05:01'), (31, 'Ethiopia', '2011-09-14 18:05:01'), (32, 'Faroe Islands', '2011-09-14 18:05:01'), (33, 'Finland', '2011-09-14 18:05:01'), (34, 'France', '2011-09-14 18:05:01'), (35, 'French Guiana', '2011-09-14 18:05:01'), (36, 'French Polynesia', '2011-09-14 18:05:01'), (37, 'Gambia', '2011-09-14 18:05:01'), (38, 'Germany', '2011-09-14 18:05:01'), (39, 'Greece', '2011-09-14 18:05:01'), (40, 'Greenland', '2011-09-14 18:05:01'), (41, 'Holy See (Vatican City State)', '2011-09-14 18:05:01'), (42, 'Hong Kong', '2011-09-14 18:05:01'), (43, 'Hungary', '2011-09-14 18:05:01'), (44, 'India', '2011-09-14 18:05:01'), (45, 'Indonesia', '2011-09-14 18:05:01'), (46, 'Iran', '2011-09-14 18:05:01'), (47, 'Iraq', '2011-09-14 18:05:02'), (48, 'Israel', '2011-09-14 18:05:02'), (49, 'Italy', '2011-09-14 18:05:02'), (50, 'Japan', '2011-09-14 18:05:02'), (51, 'Kazakstan', '2011-09-14 18:05:02'), (52, 'Kenya', '2011-09-14 18:05:02'), (53, 'Kuwait', '2011-09-14 18:05:02'), (54, 'Latvia', '2011-09-14 18:05:02'), (55, 'Liechtenstein', '2011-09-14 18:05:02'), (56, 'Lithuania', '2011-09-14 18:05:02'), (57, 'Madagascar', '2011-09-14 18:05:02'), (58, 'Malawi', '2011-09-14 18:05:02'), (59, 'Malaysia', '2011-09-14 18:05:02'), (60, 'Mexico', '2011-09-14 18:05:02'), (61, 'Moldova', '2011-09-14 18:05:02'), (62, 'Morocco', '2011-09-14 18:05:02'), (63, 'Mozambique', '2011-09-14 18:05:02'), (64, 'Myanmar', '2011-09-14 18:05:02'), (65, 'Nauru', '2011-09-14 18:05:02'), (66, 'Nepal', '2011-09-14 18:05:02'), (67, 'Netherlands', '2011-09-14 18:05:02'), (68, 'New Zealand', '2011-09-14 18:05:02'), (69, 'Nigeria', '2011-09-14 18:05:02'), (70, 'North Korea', '2011-09-14 18:05:02'), (71, 'Oman', '2011-09-14 18:05:02'), (72, 'Pakistan', '2011-09-14 18:05:02'), (73, 'Paraguay', '2011-09-14 18:05:02'), (74, 'Peru', '2011-09-14 18:05:02'), (75, 'Philippines', '2011-09-14 18:05:02'), (76, 'Poland', '2011-09-14 18:05:02'), (77, 'Puerto Rico', '2011-09-14 18:05:02'), (78, 'Romania', '2011-09-14 18:05:02'), (79, 'Runion', '2011-09-14 18:05:02'), (80, 'Russian Federation', '2011-09-14 18:05:02'), (81, 'Saint Vincent and the Grenadines', '2011-09-14 18:05:02'), (82, 'Saudi Arabia', '2011-09-14 18:05:02'), (83, 'Senegal', '2011-09-14 18:05:02'), (84, 'Slovakia', '2011-09-14 18:05:02'), (85, 'South Africa', '2011-09-14 18:05:02'), (86, 'South Korea', '2011-09-14 18:05:02'), (87, 'Spain', '2011-09-14 18:05:02'), (88, 'Sri Lanka', '2011-09-14 18:05:03'), (89, 'Sudan', '2011-09-14 18:05:03'), (90, 'Sweden', '2011-09-14 18:05:03'), (91, 'Switzerland', '2011-09-14 18:05:03'), (92, 'Taiwan', '2011-09-14 18:05:03'), (93, 'Tanzania', '2011-09-14 18:05:03'), (94, 'Thailand', '2011-09-14 18:05:03'), (95, 'Tonga', '2011-09-14 18:05:03'), (96, 'Tunisia', '2011-09-14 18:05:03'), (97, 'Turkey', '2011-09-14 18:05:03'), (98, 'Turkmenistan', '2011-09-14 18:05:03'), (99, 'Tuvalu', '2011-09-14 18:05:03'), (100, 'Ukraine', '2011-09-14 18:05:03'), (101, 'United Arab Emirates', '2011-09-14 18:05:03'), (102, 'United Kingdom', '2011-09-14 18:05:03'), (103, 'United States', '2011-09-14 18:05:03'), (104, 'Venezuela', '2011-09-14 18:05:03'), (105, 'Vietnam', '2011-09-14 18:05:03'), (106, 'Virgin Islands, U.S.', '2011-09-14 18:05:03'), (107, 'Yemen', '2011-09-14 18:05:03'), (108, 'Yugoslavia', '2011-09-14 18:05:03'), (109, 'Zambia', '2011-09-14 18:05:03'), (1000, 'John Doe', '2019-07-11 23:21:49')]

Print all the output of the "fetchall()" method by iterating over each row

for row in cur.execute('SELECT * FROM country'):

    print(row)
(1, 'Afghanistan', '2011-09-14 18:05:01')
(2, 'Algeria', '2011-09-14 18:05:01')
(3, 'American Samoa', '2011-09-14 18:05:01')
(4, 'Angola', '2011-09-14 18:05:01')
(5, 'Anguilla', '2011-09-14 18:05:01')
(6, 'Argentina', '2011-09-14 18:05:01')
(7, 'Armenia', '2011-09-14 18:05:01')
(8, 'Australia', '2011-09-14 18:05:01')
(9, 'Austria', '2011-09-14 18:05:01')
(10, 'Azerbaijan', '2011-09-14 18:05:01')
(11, 'Bahrain', '2011-09-14 18:05:01')
(12, 'Bangladesh', '2011-09-14 18:05:01')
(13, 'Belarus', '2011-09-14 18:05:01')
(14, 'Bolivia', '2011-09-14 18:05:01')
(15, 'Brazil', '2011-09-14 18:05:01')
(16, 'Brunei', '2011-09-14 18:05:01')
(17, 'Bulgaria', '2011-09-14 18:05:01')
(18, 'Cambodia', '2011-09-14 18:05:01')
(19, 'Cameroon', '2011-09-14 18:05:01')
(20, 'Canada', '2011-09-14 18:05:01')
(21, 'Chad', '2011-09-14 18:05:01')
(22, 'Chile', '2011-09-14 18:05:01')
(23, 'China', '2011-09-14 18:05:01')
(24, 'Colombia', '2011-09-14 18:05:01')
(25, 'Congo, The Democratic Republic of the', '2011-09-14 18:05:01')
(26, 'Czech Republic', '2011-09-14 18:05:01')
(27, 'Dominican Republic', '2011-09-14 18:05:01')
(28, 'Ecuador', '2011-09-14 18:05:01')
(29, 'Egypt', '2011-09-14 18:05:01')
(30, 'Estonia', '2011-09-14 18:05:01')
(31, 'Ethiopia', '2011-09-14 18:05:01')
(32, 'Faroe Islands', '2011-09-14 18:05:01')
(33, 'Finland', '2011-09-14 18:05:01')
(34, 'France', '2011-09-14 18:05:01')
(35, 'French Guiana', '2011-09-14 18:05:01')
(36, 'French Polynesia', '2011-09-14 18:05:01')
(37, 'Gambia', '2011-09-14 18:05:01')
(38, 'Germany', '2011-09-14 18:05:01')
(39, 'Greece', '2011-09-14 18:05:01')
(40, 'Greenland', '2011-09-14 18:05:01')
(41, 'Holy See (Vatican City State)', '2011-09-14 18:05:01')
(42, 'Hong Kong', '2011-09-14 18:05:01')
(43, 'Hungary', '2011-09-14 18:05:01')
(44, 'India', '2011-09-14 18:05:01')
(45, 'Indonesia', '2011-09-14 18:05:01')
(46, 'Iran', '2011-09-14 18:05:01')
(47, 'Iraq', '2011-09-14 18:05:02')
(48, 'Israel', '2011-09-14 18:05:02')
(49, 'Italy', '2011-09-14 18:05:02')
(50, 'Japan', '2011-09-14 18:05:02')
(51, 'Kazakstan', '2011-09-14 18:05:02')
(52, 'Kenya', '2011-09-14 18:05:02')
(53, 'Kuwait', '2011-09-14 18:05:02')
(54, 'Latvia', '2011-09-14 18:05:02')
(55, 'Liechtenstein', '2011-09-14 18:05:02')
(56, 'Lithuania', '2011-09-14 18:05:02')
(57, 'Madagascar', '2011-09-14 18:05:02')
(58, 'Malawi', '2011-09-14 18:05:02')
(59, 'Malaysia', '2011-09-14 18:05:02')
(60, 'Mexico', '2011-09-14 18:05:02')
(61, 'Moldova', '2011-09-14 18:05:02')
(62, 'Morocco', '2011-09-14 18:05:02')
(63, 'Mozambique', '2011-09-14 18:05:02')
(64, 'Myanmar', '2011-09-14 18:05:02')
(65, 'Nauru', '2011-09-14 18:05:02')
(66, 'Nepal', '2011-09-14 18:05:02')
(67, 'Netherlands', '2011-09-14 18:05:02')
(68, 'New Zealand', '2011-09-14 18:05:02')
(69, 'Nigeria', '2011-09-14 18:05:02')
(70, 'North Korea', '2011-09-14 18:05:02')
(71, 'Oman', '2011-09-14 18:05:02')
(72, 'Pakistan', '2011-09-14 18:05:02')
(73, 'Paraguay', '2011-09-14 18:05:02')
(74, 'Peru', '2011-09-14 18:05:02')
(75, 'Philippines', '2011-09-14 18:05:02')
(76, 'Poland', '2011-09-14 18:05:02')
(77, 'Puerto Rico', '2011-09-14 18:05:02')
(78, 'Romania', '2011-09-14 18:05:02')
(79, 'Runion', '2011-09-14 18:05:02')
(80, 'Russian Federation', '2011-09-14 18:05:02')
(81, 'Saint Vincent and the Grenadines', '2011-09-14 18:05:02')
(82, 'Saudi Arabia', '2011-09-14 18:05:02')
(83, 'Senegal', '2011-09-14 18:05:02')
(84, 'Slovakia', '2011-09-14 18:05:02')
(85, 'South Africa', '2011-09-14 18:05:02')
(86, 'South Korea', '2011-09-14 18:05:02')
(87, 'Spain', '2011-09-14 18:05:02')
(88, 'Sri Lanka', '2011-09-14 18:05:03')
(89, 'Sudan', '2011-09-14 18:05:03')
(90, 'Sweden', '2011-09-14 18:05:03')
(91, 'Switzerland', '2011-09-14 18:05:03')
(92, 'Taiwan', '2011-09-14 18:05:03')
(93, 'Tanzania', '2011-09-14 18:05:03')
(94, 'Thailand', '2011-09-14 18:05:03')
(95, 'Tonga', '2011-09-14 18:05:03')
(96, 'Tunisia', '2011-09-14 18:05:03')
(97, 'Turkey', '2011-09-14 18:05:03')
(98, 'Turkmenistan', '2011-09-14 18:05:03')
(99, 'Tuvalu', '2011-09-14 18:05:03')
(100, 'Ukraine', '2011-09-14 18:05:03')
(101, 'United Arab Emirates', '2011-09-14 18:05:03')
(102, 'United Kingdom', '2011-09-14 18:05:03')
(103, 'United States', '2011-09-14 18:05:03')
(104, 'Venezuela', '2011-09-14 18:05:03')
(105, 'Vietnam', '2011-09-14 18:05:03')
(106, 'Virgin Islands, U.S.', '2011-09-14 18:05:03')
(107, 'Yemen', '2011-09-14 18:05:03')
(108, 'Yugoslavia', '2011-09-14 18:05:03')
(109, 'Zambia', '2011-09-14 18:05:03')
(1000, 'John Doe', '2019-07-11 23:21:49')

Adding Data to a Database

In order to store data in sqlite3, we need to create a database. Generally, databases are saved in ".db" or ".sqlite extension". Again, we use "connect()" method, SQLite will try to open the file that we are assigning and if it doesn't exist, it will create a new database. Let's assume we want to create a table that stores "country" table output which has three variables including "country_id", "country", and "last-update".

Okay, so far, so good, before we create a new database, let's have a quick review on SQLite data types. SQLite defines four main types of fields:

  • NULL----a NULL value.
  • INTEGER----a signed integer
  • REAL----a floating point value
  • TEXT----a text string
  • BLOB----a blob of data
conn = sqlite3.connect('country.sqlite')

cur1 = conn.cursor()

cur1.execute('CREATE TABLE country_data (Country_ID INTEGER, Country VARCHAR, Date TIMESTAMP)')
<sqlite3.Cursor at 0x187fc12d500>

As you see the code above, we have another type of field for "last_update" column which is "TIMESTAMP"

Now that we have a database, it is time to store some data into it. You will need to do the following:

dataset = cur.execute('SELECT * FROM country')
cur1.executemany('INSERT INTO country_data VALUES (?,?,?)', dataset )
<sqlite3.Cursor at 0x187fc12d500>

Don't forget to save this by merely calling the "commit()" method

conn.commit()

BOOOOOOOOM. Congratulations! We just created a new database

png