When performing data analysis with both Python and SQL, often I want to be able to create a SQL query that filters a table using items in a Python list.

To select elements from a list in SQL, you use the following syntax. Consider the example below where you have a table called fruits, and you want to select the rows where the value in the fruit-name column is either apple, banana, or orange.

SELECT * FROM fruits
WHERE fruit_name IN ('apple', 'banana', 'orange')

Now imagine that the sequence of fruit names is the result of some Python analysis, and you want to pass the list to a SQL query to create the IN statement. You can use the function below. Passing a list will return a string with the items of the list in parentheses seprated by commas.

def parameterize_SQL_in_statement(items):
    return f"""('{"', '".join(items)}')"""

Here’s an example using it in an analysis:

import pandas as pd
import pyodbc

conn = pyodbc.connect('pyodbc connection string')
fruit_names = ['apple', 'banana', 'orange'] # the result of some python analysis

in_statement = parameterize_SQL_in_statement(fruit_names) # ('apple', 'banana', 'orange')

# use f string to insert the generated IN statement
query = f"""SELECT * 
            FROM fruits 
            WHERE fruit_names IN {in_statement}
            """

df = pd.read_sql(query, conn)

NOTE: This method creates risks for SQL injection attacks and should not be used for applications that will be exposed to untrusted users. I feel comfortable using this when doing internal analyis with read only rights to the database.

Updated: