Create a SQL ‘IN’ Statement from a Python List (SQL)
When performing data analysis with both Python and SQL, often I want to be able to create a SQL query that filters a table using items in a Python list.
To select elements from a list in SQL, you use the following syntax. Consider the
example below where you have a table called fruits
, and you want to select the rows
where the value in the fruit-name
column is either apple
, banana
, or orange
.
SELECT * FROM fruits
WHERE fruit_name IN ('apple', 'banana', 'orange')
Now imagine that the sequence of fruit names is the result of some Python analysis,
and you want to pass the list to a SQL query to create the IN
statement. You can use the
function below. Passing a list will return a string with the items of the list in parentheses
seprated by commas.
def parameterize_SQL_in_statement(items):
return f"""('{"', '".join(items)}')"""
Here’s an example using it in an analysis:
import pandas as pd
import pyodbc
conn = pyodbc.connect('pyodbc connection string')
fruit_names = ['apple', 'banana', 'orange'] # the result of some python analysis
in_statement = parameterize_SQL_in_statement(fruit_names) # ('apple', 'banana', 'orange')
# use f string to insert the generated IN statement
query = f"""SELECT *
FROM fruits
WHERE fruit_names IN {in_statement}
"""
df = pd.read_sql(query, conn)
NOTE: This method creates risks for SQL injection attacks and should not be used for applications that will be exposed to untrusted users. I feel comfortable using this when doing internal analyis with read only rights to the database.