Data Engineer SQL Questions


QuestionDescription
Difficulty  
Completed
Salary Range DifferenceCalculate the difference between the sum of the highest salaries and the sum of the lowest salaries in the company. The table "playground.employees_salary" contains columns "id" (unique employee ID), "name" (employee's name), and "salary" (employee's salary as a positive integer). The result should be a single column "difference" with one row representing the calculated difference. If the "playground.employees_salary" table is empty, "difference" should be 0.medium
Find Product PricesUsing the table playground.product_prices, create a SQL query to find all products and their prices on 2023-08-17, assuming the initial price of all products was 10 before any price changes. Order the results in ascending order of product_id.medium
Find Viewers with Multiple Article Views in a DayUsing the table playground.views, write a SQL query to identify all viewers who viewed more than one article on the same day. The table includes columns viewer_id (the ID of the viewer), article_id (the ID of the article viewed), and view_date (the date of the view). The result should contain a single column named viewer_id, listing each viewer who meets the criteria without duplicates, and should be sorted in ascending order of viewer_id.easy
Check Test AnswersCreate a SQL query to evaluate test answers stored in a table named playground.answers with columns id (unique question ID), correct_answer (string), and given_answer (which can be NULL). Return a table with columns id and checks, where checks is "no answer" if given_answer is NULL, "correct" if given_answer matches correct_answer, and "incorrect" otherwise. Order the results by id.easy
Comparing State Fatal Collisions to the National AverageUsing playground.bad_drivers, write a SQL query to compare each state’s fatal collisions per billion miles to the national average. Include a column that indicates whether the state is "Above Average" or "Below Average". The resultant table should have three columns, "state", "fatal_collisions_per_billion_miles" and "comparison_to_national_avg". Show the result ordered by state name asc.medium
Calculating Median Searches per UserUsing the table playground.search_freq with each row representing the number of searches (searches column) made by a certain number of users (users column), write a SQL query to calculate the median number of searches per user up to one decimal place and be sure to cast it as a double.hard
Month with the Highest Total BirthsDetermine the month with the highest total number of births in the playground.us_birth_stats table. The output should show the month and the total number of births.easy
Identifying students with the best gradeUsing the playground.marks table, write a SQL query to determine the students who achieved the highest marks when graded according to Option 3, where the final exam accounts for 100% of the grade. The other grading options are provided for context: Option 1: Midterm 1 contributes 25%, Midterm 2 contributes 25%, and the Final exam contributes 50% of the grade. Option 2: Midterm 1 contributes 50% and Midterm 2 contributes 50% of the grade. Option 3: The Final exam contributes 100% of the grade. The output should be sorted by the students' names in ascending order. Ensure the column names in the result are displayed in lowercase. The resultant table should have two columns, name and their id.medium
Who are the top 10 authors by number of reviews?Using bootcamp.books, find the top 10 authors by reviews, no_of_reviews is a string column with bad data, try your best to get the values to parse correctlymedium
Filtering Dance Contest ScoresGiven a table playground.dance_scores (columns: arbiter_id, first_criterion, second_criterion, third_criterion) with scores from 1 to 10 awarded by judges in a dance contest, write a query to exclude scores from any judge who awarded an extreme score (either minimum or maximum) for at least two criteria. Return the filtered scores, sorted by arbiter_id.medium
Identify Department for ReductionSelect departments that have no more than 5 employees. From these, choose departments based on the descending order of the total salary of its employees. In case of a tie in total salary, prioritize the department with the greater number of employees. If still tied, prioritize by the smallest department id. From the resulting list, eliminate departments in even positions, leaving only those in odd positions for further consideration. The resulting table should contain columns for department name, number of employees, and total salary, sorted as described.hard
Finding Actors by Favorite Movie GenreGiven three tables - playground.movies, playground.starring_actors, and playground.actor_ages - identify actors from the most common movie genre in your library. Assume each actor is listed once representing their best role. The task is to list actors with their age, sorted from oldest to youngest, who star in movies of your favorite genre (the most common genre in your list). In case of actors being the same age, sort them by name.The resultant table should have two columns, actor and age.medium
Finding Highest Grade and Corresponding Course per StudentUsing the table playground.student_grades, write a SQL query to find the highest grade and its corresponding course for each student. In cases where there is a tie for the highest grade, select the course with the smallest course_id. The output should include student_id, course_id, and the grade, with the results sorted by increasing student_id.medium
Determining the Order of SuccessionGiven a table Successors with columns: name, birthday, and gender, write a SQL query to list the names of the King's children in order of their succession to the throne and their birthday("name", "birthday"). Succession is based on age seniority. Prefix the name with "King" for males and "Queen" for females. The result should be sorted by birthday in ascending order to determine the succession order.easy
Total Number of Births Per YearWrite a SQL query to calculate the total number of births recorded for each year in the playground.us_birth_stats table. Order the results by year.easy
Determining Optimal Packaging for Christmas GiftsGiven two tables, playground.gifts and playground.packages, write a SQL query to match each gift to the smallest package it fits into based on dimensions. A gift fits in a package if its dimensions are less than or equal to those of the package. A package is considered smaller than another if its volume is smaller. Each package can hold only one gift. Produce a table with columns: package_type and number, where number indicates how many gifts are matched to each package_type. Exclude package types not used. Sort the result by package_type in ascending order. Assume every gift fits in at least one package and no two packages have the same volume.hard
Summarizing Foreign Competitors by Country with Total SummaryGiven the playground.foreign_competitors table with columns: competitor and country, write a SQL query to calculate the number of competitors per country. Additionally, include a final summary row that shows "Total:" and the total number of competitors. The resulting table should have two columns: country and competitors, sorted by competitors and country names in ascending order, and include the summary row at the bottom.medium
Countries and Their Preferences for Beer, Spirit, and WineCreate a SQL query to find all countries and their preference for beer, spirit, and wine, based on the highest serving type as their preference. The output should show only the countries whose preferences are wine and spirits, ordered in ascending order of the country name.hard
Summarizing Leisure Activity Resorts by CountryGiven a table playground.country_activities with columns: id (unique identifier), country (name of the country), region (region within the country), leisure_activity_type (type of leisure activity), and number_of_places (number of resorts offering the activity). Construct a SQL query to aggregate this information into a new table structure. The resulting table should have columns for each country, with the counts of resorts offering each type of activity: adventure_park, golf, river_cruise, and kart_racing. Sort the result by country name in ascending order.hard
Top Reviewed Customers per ProductUsing the table playground.product_reviews, write a SQL query to identify, for each product, the customer who provided the highest review score. If there are ties in review score, the customer with the most helpful votes should be considered top. The output should include columns for product_id, customer_id, review_score, and helpful_votes, capturing the details of the top review for each product ordered in ascending order of product_ideasy
Most Frequently Purchased Sub-Category in Each RegionWrite a SQL query to determine the most frequently purchased sub-category in each region based on the playground.superstore dataset. The result should include the region, sub-category, and the count of purchases, and should list only the top sub-category for each region. Display the result in ascending order of the regionhard
Top 2 Customers with Highest Sales Every MonthUsing the table playground.sales, Identify the top 2 customers based on their total sales for each month. Display the result based on asc order of their rank and month.medium
Identifying Active Businesses Based on Event OccurrencesUsing the playground.business_events table, identify all active businesses. An active business is defined as one that has more than one event type with occurrences greater than the average occurrences of that event type among all businesses. The goal is to filter businesses based on their activity level, comparing individual event occurrences to the overall average for those event types. The result should be a table with one column called business_id with all the business ids that are active.medium
Calculate 3-Day Rolling Average of Posts per UserUsing the table playground.posts, write a SQL query to calculate the 3-day rolling average of posts for each user. The output should include columns: user_id, post_date, and third_rolling_avg, where third_rolling_avg is the 3-day rolling average of post count for each user, rounded to 2 decimal places. The rolling average should be calculated over a specified 3-day period for each user, including the current day and the two preceding days. The result should be ordered in ascending order of post datemedium
Select Rows With Maximum RevenueUsing the table playground.revenue, write a SQL query to select rows from a given table that have the maximum revenue value for each id. The resultant table should have three columns - "id", "rev", "content". Additionally, the results should be ordered in descending order by revenue.easy
Analyze Engine Efficiency by Fuel TypeUsing the table playground.automobile, write a SQL query to analyze engine efficiency by comparing the average city mileage to highway mileage for each fuel type. The efficiency ratio is defined as the average highway mileage divided by the average city mileage. Include fuel type, average city mileage, average highway mileage, and efficiency ratio in the results, rounded up to 2 decimal places, ordered by the highest efficiency ratio.medium
Cars with Above Average Engine SizeUsing the table playground.automobile, Create a SQL query to identify cars that have an engine size above the average across all cars in the dataset. The result should include the brand, fuel_type, and engine size, ordered by engine size in descending order and then brand_name in asc order.easy
Average Number of Births by Day of the WeekCreate a SQL query that finds the average number of births for each day of the week across all years in the playground.us_birth_stats table. Cast the average as an integer. Order the results by the day of the week.easy
Identify Products Sold Exclusively in January 2024Using the tables playground.products and playground.product_sales, write a SQL query to find products that were sold exclusively within the month of January 2024 (from "2024-01-01" to "2024-01-31", inclusive). The output should include the product_id and product_name for each qualifying product.medium
Total Server Fleet Uptime in Full DaysUsing the 'playground.sessions' table, write a SQL query to calculate the total uptime of the server fleet in full days. The table records each server's start and stop times as separate entries. The total uptime ('total_uptime_days') is defined as the sum of all periods during which any server was running, represented as a single integer value for the total number of full days. Assume each 'start' session_status has a corresponding 'stop' status, and sessions do not overlap for the same server.hard
Optimizing Stock of Prime and Non-Prime Items in a WarehouseGiven a warehouse with a capacity of 500,000 square feet, Amazon wishes to optimize the number of items stocked, prioritizing prime items. The goal is to fill the warehouse with as many prime items as possible and then supplement the remaining space with non-prime items. However, there are new operational constraints: prime and non-prime items must be stocked in equal quantities, and there must always be non-prime items in stock. Assuming each item occupies the same amount of space, using the table playground.inventory, write a query to determine the maximum number of prime and non-prime items that can be accommodated in the warehouse. The output should list the item_type (prime_eligible or not_prime) followed by the maximum number of item_count that can be stocked, ensuring item counts are whole numbers and adhering to the given constraints and prime_eligible first.hard
Check Second Sale Brand Preference MatchUsing the tables playground.users, playground.items, and playground.orders, write a SQL query to determine for each user, whether the brand of the second item they sold matches their preferred brand. The output should include two columns: seller_id and has_pref_brand, where has_pref_brand is "yes" if the brand of the second item sold matches the users preferred brand, and "no" otherwise. If a user sold less than two items, report the answer for that user as "no". It is guaranteed that no seller sold more than one item on the same day. The result should be ordered by user_id in ascending order.hard
Which states have high risk of fatal collisions involving AlcoholUsing playground.bad_drivers, determine the states and their percentage of alcohol impaired collisions where the risk of fatal collisions involving alcohol is significantly higher than the national average. Consider a state as high-risk if its percent_alcohol_impaired is at least 20% higher than the national average. Show the output in descending order of the percent_alcohol_impaired.medium
Year-on-Year Growth Rate CalculationUsing the table playground.dates, write a SQL query to calculate the year-on-year growth rate for the total spend of each product. The output should include the year, product ID, current years spend, previous years spend, and the year-on-year growth percentage, rounded to 2 decimal places, grouped by product ID and sorted by year in ascending order.hard
Calculating Available Seats on FlightsCalculate the number of seats not yet purchased for each flight. Use three tables: playground.flights (with columns: flight_id, plane_id), playground.planes (with columns: plane_id, number_of_seats), and playground.purchases (with columns: flight_id, seat_no). Each row in purchases is unique for (flight_id, seat_no) pairs. For each flight_id, compute the free_seats as the number of seats not purchased. Order the results by flight_id in ascending order. Ensure consistency in purchases, with no records for non-existing flight_ids or seat_nos.medium
Determining the Soccer Series WinnerFrom the "scores" table, write a SQL query to determine the series winner based on the following criteria: 1) The team with the most game wins is the winner. 2) If the wins are equal, the team with the better overall goal difference (goals scored - goals conceded) wins. 3) If goal differences are also equal, the winner is the team with more goals scored in away games (when not the host). The "scores" table includes match_id (unique match ID), first_team_score, second_team_score, and match_host (1 or 2, indicating the host team). Return a table with a single column "winner" that contains 1 (for the first team), 2 (for the second team), or 0 if no winner is determined based on these criteria.hard
Customers with More Than 20 OrdersWrite a SQL query to display all loyal customers from the playground.superstore table. A customer is considered loyal if they have placed more than 20 orders. The query should return the customer ID, customer name, and the total number of orders for each of these customers. Display the result in descending order of their orders and then ascending order of their nameseasy
Detecting fraudulent activitiesWe are an E-commerce app and we are introducing a big promotional campaign of the year on 01/01/2019 from 18 UTC to 23 UTC. It's very important to ensure the integrity of our user base and prevent abuse of our promotions. To achieve this, we need to identify potential instances of fraudulent accounts exploiting our promotions. Our focus is on querying all user_ids and determining the total number of users who have placed an order within 1 hour prior to each user_id's order placement during the campaign time. Leveraging a table that records live orders at a user level (order_id, user_id, ts), retrieve an output table that has order_id, oneHour_order_count, ts. Order the query output by ts in an ascending order.hard
Identifying Empty DepartmentsGiven two tables, playground.employees and playground.departments, with employees containing id, full_name, and department, and departments containing id (unique department ID) and dep_name (department name), write a SQL query to build a table with one column, dep_name. This table should list all the departments that currently have no employees, sorted by the department id.easy
Filtering Students in Active ClubsGiven tables clubs (id: unique club id, name: club name) and students (id: unique student id, name: student name, club_id: club's id), return a list from the students table for those who are in clubs that still exist in the clubs table. The result should have three columns (id, name, club_id) and be sorted by students' ids (id) and include only those students whose club_id matches an id in the clubs table.easy
Count Unique Conversation ThreadsUsing the table playground.messenger, write a SQL query to find the number of unique conversation threads in the playground.messenger table. A conversation thread is identified by the sender_id and receiver_id columns. Note that if a thread has sender_id and receiver_id inverted in another row, it should still be counted as the same thread. For example, a conversation where sender_id=1 and receiver_id=2 should be considered the same thread as one where sender_id=2 and receiver_id=1. The result should just be a single column called count given in any order.hard
Find Top 3 Commonly Bought Product PairsUsing the table playground.product_transactions, write a query to find the top 3 pairs of products that are most frequently bought together in the same transaction from the table playground.product_transactions. For this analysis, consider a pair to consist of two different products purchased in the same transaction. The output should list the pairs of products and the frequency of each pair (product1, product2, freq), ordered by product1 in ascending order, product2 in ascending order and frequency in descending order.hard
Find US Customers Who Rented and Streamed Videos in Early FebruaryWrite a SQL query to return the US customers who rented a video on February 1st, 2023, and then streamed the same video between February 2nd and February 8th, 2023. Use the tables playground.rental for rental data and playground.streams for streaming data. The output should include unique user IDs of these customers ordered in ascending order.easy
Identifying the Bank RobberUsing table playground.suspect, filter out suspects who cannot be the bank robber based on the following clues: the robber is not taller than 170cm, and their name matches the pattern "B. Gre?n" where the first letter of the name is "B" or "b" and the surname is similar to "Green" but with the fourth letter being unreadable and potentially any character. The match should be case-insensitive. For each suspect that fits these criteria, select their id, name, and surname. Order the results by suspect id in ascending order.easy
Calculate Monthly Average Product RatingUsing the table playground.product_reviews, write a SQL query to calculate the monthly average rating for each product. The resulting table should have three columns: month, product_id, and avg_review, where avg_review is the average review rating of the product for that month. The results should be ordered in ascending order of month and product_idmedium
Most Popular Fuel System for Each Fuel TypeUsing the table playground.automobile, create a SQL query to find the most popular fuel system for each fuel type based on the number of cars. Include columns for fuel type, fuel system, and the count of cars, showing only the top fuel system for each fuel type. Order the result on the basis of fuel type in ascending order.medium
Identifying Multi-Component Bugs and Their ImpactAnalyze the playground.bug, playground.component, and playground.bug_component tables to identify bugs affecting multiple components. Bug table contains num (unique bug number) and title (bug title). Component table includes id (unique component id) and title (component title). BugComponent table links bugs to components with bug_num (references Bug.num) and component_id (references Component.id). The task is to list each bug (that affects more than one component) with the titles of affected components and the total number of bugs in these components. Output columns: bug_title, component_title, bugs_in_component, ordered by bugs_in_component (descending), then by bug_title (ascending).hard
Who are the top 10 NBA players by consecutive 20+ point seasons?Using the table bootcamp.nba_player_seasons, find the most consecutive seasons a player has scored 20+ points and find the top 10 players! Make sure to handle ties correctly (you might have more than 10 records!). Sort the output data set by consecutive_seasons descending and player_name ascending!hard
Transform workers_info Table to Exclude NULL ValuesGiven a table named playground.workers_info, write a SQL query to transform this table. The transformation should result in a table with three columns: id, column_name, and value, where column_name is the name of the original column containing a non-NULL value for each id, and value is the stringified non-NULL value from this column. Exclude rows where the original value is NULL. The result should be sorted by id in ascending order, and for rows with the same id, sorted by column_name in ascending order.medium
Analyze Yearly Profit PerformanceWrite a SQL query to analyze the profit performance of the company throughout the years using the playground.superstore table. Ensure to convert profit from varchar to a suitable numeric type for aggregation. The output should include the year extracted from the order_date (order_year), the total annual profit (total_profit) rounded to two decimal places, and should be ordered by year in descending order.easy
Calculate Average Lead Time for Each Shipping OptionUsing the playground.superstore table, write a SQL query to calculate the average lead time from order to shipping for each shipping option. The lead time is defined as the number of days from the order date to the ship date. Additionally, count the total number of shipping services provided for each shipping option. The output should include the shipping mode (ship_mode), the total count of shipping services (total_shipping_service), and the average lead time (avg_lead_time), rounded to two decimal places. Order the results by shipping mode in descending order.medium