After Technical Interview Questions - Statistics & Analytics, the adjacent placement select is usually SQL, aliases Structured Query Language, the connection utilized to query system information successful relational databases. SQL interviews seldom trial syntax alone; they trial whether you tin construe a business mobility into joins, aggregation, model functions, cohort logic, copy checks and Customer Lifetime Value calculations. This slope gives you 10 ready-to-practise query patterns that recur crossed analytics interviews.
- SQL question and reply questions usually trial patterns, not memorised queries: aggregate first, rank wrong groups, comparison clip periods, place cohorts, and select aft aggregation.
- GROUP BY collapses rows into 1 statement per group, while model functions cipher crossed related rows without collapsing the original row-level output.
- Use WHERE earlier aggregation and HAVING aft aggregation, particularly erstwhile filtering grouped totals specified arsenic customers pinch full walk greater than ₹10,000.
- Ranking questions typically request ROW_NUMBER, RANK aliases DENSE_RANK depending connected really the interviewer wants ties handled.
- Time-series questions often usage LAG for Month-over-Month and Year-over-Year comparison, and ROWS BETWEEN for moving averages and moving totals.
- Cohort retention and Customer Lifetime Value are business-facing SQL problems because they link earthy orders to repetition activity, retention complaint and gross value.
Use this representation earlier jumping into the 10 problems: astir SQL questions tin beryllium solved by first identifying the business metric, past choosing the SQL shape that preserves aliases collapses rows correctly.
How to Read This SQL Interview Bank
The root queries usage communal analytics tables specified arsenic employees, orders, order_items, products and customers. These are not meant to correspond 1 fixed database schema; they are interview-ready templates that you should accommodate to the columns fixed successful the prompt.
In analytics interviews, SQL sits betwixt business reasoning and statistical thinking. For example, Flipkart's Big Billion Days GMV study is descriptive analytics: it answers what happened done reports, dashboards and Key Performance Indicators, aliases KPIs. The SQL down that benignant of reporting is usually built from the aforesaid patterns successful this bank: day aggregation, gross sums, rankings, moving totals and cohort views.
WHERE filters rows earlier aggregation; GROUP BY creates grouped metrics; HAVING filters grouped results; model functions cipher crossed related rows without collapsing the output.
The 10 SQL Practice Questions
The array beneath is your halfway query bank. Read each statement arsenic a repeatable pattern: what the interviewer asks, what SQL conception is being tested, and which query skeleton you should beryllium capable to constitute quickly.
Ranking Questions: Salary and Top N Products
Ranking questions trial whether you tin rank rows wrong a group without losing the row-level detail. In Q1, the business mobility is not simply the 2nd highest net overall; it is the 2nd highest net per department, truthful the partition must beryllium department_id.
The nuance is ties. The root solution uses DENSE_RANK because it gives the aforesaid rank to tied salaries and does not create gaps. ROW_NUMBER would unit a unsocial series and could incorrectly skip tied 2nd salaries, while RANK gives tied values the aforesaid rank but leaves gaps aft ties.
Q8 uses the aforesaid thought for products: aggregate gross by class and product, past rank products wrong each category. This shape is communal successful dashboards wherever class managers want the apical products successful each class alternatively than the apical products overall.
Time-Series Questions: MoM, YoY, Moving Average and Running Total
Month-over-Month, aliases MoM, compares a metric pinch the instantly erstwhile month. Year-over-Year, aliases YoY, compares a metric pinch the aforesaid period successful the erstwhile year. Both are usually solved pinch LAG, which returns a worth from a anterior statement based connected the ordering you define.
Q2 first aggregates orders into monthly revenue, past applies LAG complete month. This bid matters because comparing earthy bid rows straight would not nutrient a cleanable monthly business metric.
Joins and Set Difference: January Buyers Not successful February
Q3 asks for customers who bought successful January but not February, besides called lapsed customers successful the source. The elemental solution uses NOT IN, but the root notes that it whitethorn beryllium slow connected ample tables. The much performant shape creates chopped January buyers and chopped February buyers, near joins February onto January, and keeps rows wherever the February customer_id is null.
This is simply a classical retention and reactivation query. In interviews, explicate some methods if clip allows: NOT IN is readable, while LEFT JOIN is often the stronger reply erstwhile information size matters.
To find users successful group A but not group B, build chopped A, near subordinate chopped B connected the personification key, and select wherever the B-side cardinal is null.
Cohort Retention: From Acquisition Month to Active Users
Cohort study groups users by the play successful which they were acquired and tracks really galore stay progressive successful later periods. In Q5, each personification is assigned a cohort_month utilizing the first bid date, past each later bid is converted into months_after_acquisition.
The last retention complaint is calculated by dividing progressive users by the cohort size and multiplying by 100. The root query rounds this to 1 decimal place. This shape is valuable because it turns transaction information into a retention curve, which is much meaningful than a azygous full bid count.
Duplicates: Counting vs Flagging Full Rows
Duplicate discovery looks easy, but interviews often expect 2 answers. The first reply finds copy keys, specified arsenic copy email addresses, by grouping connected email and filtering pinch HAVING COUNT(*) > 1. This returns the duplicated email values and their counts.
The 2nd reply is utilized erstwhile you request the afloat copy rows. The root uses ROW_NUMBER complete email ordered by created_at to support the first statement arsenic row_num 1 and emblem later rows arsenic duplicates. A communal nuance is that a model usability othername is often filtered successful an outer query, because galore SQL engines do not let filtering the othername successful the aforesaid SELECT level.
Customer Lifetime Value: CLV Query Logic
Customer Lifetime Value, aliases CLV, estimates the worth of a customer complete a period. In the root query, customer metrics are first built from delivered orders: order_count, total_revenue, avg_order_value, first_order, last_order and customer_age_yrs.
The query past calculates orders_per_year. If customer_age_yrs is greater than 0, it divides order_count by customer_age_yrs and rounds to 1 decimal. Otherwise, it uses order_count directly. Finally, the elemental 3-year CLV is computed and the apical 100 customers are returned by descending simple_clv_3yr.
Simple CLV = AOV × Purchase Frequency × Expected Lifespan. In the root SQL query, the expected lifespan utilized is 3 years.
Worked Example: Month-over-Month Revenue Growth
This worked illustration shows really to move from a business mobility to a cleanable SQL pattern. The business is an orders array pinch order_date and order_value. The problem is to cipher monthly gross and comparison each period pinch the erstwhile month.
The aforesaid reasoning applies to YoY revenue. The only awesome alteration is the partition: for YoY, partition by period and bid by twelvemonth truthful the comparison stays wrong the aforesaid almanac period crossed years.
How to Practise These 10 Questions
Do not practise these arsenic isolated memorisation drills. For each problem, opportunity the business meaning first, place the required grain, past constitute the SQL. The atom is the level of item successful the output, specified arsenic 1 statement per customer, 1 statement per month, aliases 1 statement per class and product.
Conclusion
Strong SQL question and reply capacity comes from recognising patterns quickly and explaining why your query matches the business question. If you tin move confidently crossed ranking, joins, aggregation, windows, cohorts, duplicates and CLV, you are prepared for the astir communal analytics SQL screens.
The astir predominant correction is utilizing the correct SQL usability astatine the incorrect grain. For example, applying LAG earlier monthly aggregation, utilizing ROW_NUMBER erstwhile tied ranks should beryllium preserved, aliases filtering aggregated totals pinch WHERE alternatively of HAVING tin make an different cleanable query logically wrong.
English (US) ·
Indonesian (ID) ·