TG Telegram Group Link
Channel: Data Analysts
Back to Bottom
Requirements for data analyst role based on some jobs from @jobs_sql

πŸ‘‰ Must be proficient in writing complex SQL Queries.

πŸ‘‰ Understand business requirements in BI context and design data models to transform raw data into meaningful insights.

πŸ‘‰ Connecting data sources, importing data, and transforming data for Business intelligence.

πŸ‘‰ Strong working knowledge in Excel and visualization tools like PowerBI, Tableau or QlikView

πŸ‘‰ Developing visual reports, KPI scorecards, and dashboards using Power BI desktop.

Nowadays, recruiters primary focus on SQL & BI skills for data analyst roles. So try practicing SQL & create some BI projects using Tableau or Power BI.

You can refer our Power BI & SQL Series to understand the essential concepts.

Here are some essential telegram channels with important resources:

❯ SQL ➟ hottg.com/sqlanalyst
❯ Power BI ➟ hottg.com/PowerBI_analyst
❯ Resources ➟ @learndataanalysis

I am planning to come up with interview series as well to share some essential questions based on my experience in data analytics field.

Like this post if you want me to start the interview series πŸ‘β€οΈ

Hope it helps :)
SQL Interview Preparation Part-2

How to use window functions and CTEs to solve SQL interview questions?

1. Common Table Expressions (CTEs):

CTEs are temporary result sets that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. They help break down complex queries and improve readability.

Syntax:
WITH cte_name AS (
SELECT column1, column2
FROM table_name
WHERE condition
)
SELECT column1, column2
FROM cte_name
WHERE another_condition;

Example Problem:
Find the top 3 highest-paid employees in each department.

Solution Using CTE:
WITH RankedSalaries AS (
SELECT
employee_id,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM employees
)
SELECT
employee_id,
department_id,
salary
FROM RankedSalaries
WHERE rank <= 3;

2. Window Functions:

Window functions perform calculations across a set of table rows related to the current row. They do not reduce the number of rows returned.

Common Window Functions:
- ROW_NUMBER(): Assigns a unique number to each row within the partition.
- RANK(): Assigns a rank to each row within the partition, with gaps in ranking for ties.
- DENSE_RANK(): Similar to RANK(), but without gaps.
- SUM(), AVG(), COUNT(), etc., over a partition.

Syntax:
SELECT column1, 
column2,
window_function() OVER (PARTITION BY column1 ORDER BY column2) AS window_column
FROM table_name;

Example Problem:
Calculate the running total of sales for each salesperson.

Solution Using Window Function:
SELECT 
salesperson_id,
sale_date,
amount,
SUM(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS running_total
FROM sales;

Combining CTEs and Window Functions:

Example Problem:
Find the cumulative sales per department and the rank of each employee within their department based on their sales.

Solution:
WITH DepartmentSales AS (
SELECT
department_id,
employee_id,
SUM(sales_amount) AS total_sales
FROM sales
GROUP BY department_id, employee_id
),
RankedSales AS (
SELECT
department_id,
employee_id,
total_sales,
RANK() OVER (PARTITION BY department_id ORDER BY total_sales DESC) AS sales_rank
FROM DepartmentSales
)
SELECT
department_id,
employee_id,
total_sales,
sales_rank,
SUM(total_sales) OVER (PARTITION BY department_id ORDER BY sales_rank) AS cumulative_sales
FROM RankedSales;


For those of you who are new to this channel read SQL Basics before going through advanced concepts πŸ˜„

Part-1: https://hottg.com/sqlspecialist/558

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
SQL INTERVIEW PREPARATION PART-3

What are the different types of SQL commands?

SQL commands can be categorized into several types based on their functionality:

- DDL (Data Definition Language): These commands are used to define and modify database structures, such as tables and indexes.
- Examples: CREATE, ALTER, DROP
- Example:

         CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
position VARCHAR(50)
);

- DML (Data Manipulation Language): These commands are used to manipulate the data within the database.
- Examples: SELECT, INSERT, UPDATE, DELETE
- Example:

         INSERT INTO employees (id, name, position) VALUES (1, 'John Doe', 'Manager');

- DCL (Data Control Language): These commands are used to control access to data within the database.
- Examples: GRANT, REVOKE
- Example:

         GRANT SELECT ON employees TO user_name;

- TCL (Transaction Control Language): These commands are used to manage transactions in the database.
- Examples: COMMIT, ROLLBACK, SAVEPOINT
- Example:

         BEGIN;
UPDATE employees SET position = 'Senior Manager' WHERE id = 1;
COMMIT;


Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
SQL INTERVIEW PREPARATION PART-4

What is the difference between INNER JOIN and OUTER JOIN?

- INNER JOIN: Returns only the rows where there is a match in both tables.
- OUTER JOIN: Returns the matched rows as well as unmatched rows from one or both tables. There are three types of OUTER JOIN:
- LEFT OUTER JOIN (or LEFT JOIN): Returns all rows from the left table, and the matched rows from the right table. If no match is found, the result is NULL on the right side.
- RIGHT OUTER JOIN (or RIGHT JOIN): Returns all rows from the right table, and the matched rows from the left table. If no match is found, the result is NULL on the left side.
- FULL OUTER JOIN: Returns rows when there is a match in one of the tables. This means it returns all rows from the left table and the right table, filling in NULLs when there is no match.

Examples:
- INNER JOIN:

     SELECT employees.name, departments.department_name
FROM employees
INNER JOIN departments ON employees.department_id = departments.id;

- LEFT JOIN:

     SELECT employees.name, departments.department_name
FROM employees
LEFT JOIN departments ON employees.department_id = departments.id;

- RIGHT JOIN:

     SELECT employees.name, departments.department_name
FROM employees
RIGHT JOIN departments ON employees.department_id = departments.id;

- FULL OUTER JOIN:

     SELECT employees.name, departments.department_name
FROM employees
FULL OUTER JOIN departments ON employees.department_id = departments.id;

Go though SQL Learning Series to refresh your basics

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
SQL INTERVIEW PREPARATION PART-5

Let's discuss about normalization today

   - Normalization is the process of organizing the data in a database to reduce redundancy and improve data integrity. The goal is to divide a database into two or more tables and define relationships between them to reduce redundancy and dependency. There are several normal forms, each with specific rules to help achieve this goal.

Normalization involves multiple steps, usually referred to as "normal forms" (NFs):

   - First Normal Form (1NF): Ensures that the table has a primary key and that each column contains atomic (indivisible) values.
     - Example:
      
       CREATE TABLE customers (
           customer_id INT PRIMARY KEY,
           customer_name VARCHAR(100),
           contact_number VARCHAR(15)
       );
      

   - Second Normal Form (2NF): Achieves 1NF and ensures that all non-key attributes are fully functionally dependent on the primary key. This means removing partial dependencies of any column on the primary key.
     - Example: If a table has a composite key (e.g., order_id, product_id) and some columns depend only on part of that key, those columns should be moved to another table.

   - Third Normal Form (3NF): Achieves 2NF and ensures that all the attributes are functionally dependent only on the primary key. This eliminates transitive dependencies.
     - Example:
      
       CREATE TABLE orders (
           order_id INT PRIMARY KEY,
           customer_id INT,
           order_date DATE,
           FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
       );

       CREATE TABLE order_details (
           order_id INT,
           product_id INT,
           quantity INT,
           PRIMARY KEY (order_id, product_id),
           FOREIGN KEY (order_id) REFERENCES orders(order_id)
       );
      

   - Boyce-Codd Normal Form (BCNF): A stricter version of 3NF where every determinant is a candidate key. This addresses situations where 3NF is not sufficient to eliminate all redundancies.

Tricky Question:
   - How would you approach normalizing a table that contains repeating groups of data?
     - This question tests the understanding of the concept of atomicity and the process of transforming a table into 1NF.

Example Answer:
   - "If a table contains repeating groups, such as multiple phone numbers in one column separated by commas, I would first ensure that each piece of data is atomic. I would create a separate table for the repeating group and link it with a foreign key to the original table, thereby normalizing the data into 1NF."

Go though SQL Learning Series to refresh your basics

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
SQL INTERVIEW PREPARATION PART-6

Let's discuss about subquery today

- A subquery, also known as an inner query or nested query, is a query within another SQL query. It is used to provide data to the main query (outer query). Subqueries can be used in various clauses such as SELECT, FROM, WHERE, and HAVING.

Types of Subqueries:
- Single-row subquery: Returns a single row and is used with operators like =, <, >.
- Multi-row subquery: Returns multiple rows and is used with operators like IN, ANY, ALL.
- Correlated subquery: A subquery that references columns from the outer query. It is evaluated once for each row processed by the outer query.

Examples:
- Single-row subquery:

     SELECT name
FROM employees
WHERE department_id = (SELECT id FROM departments WHERE department_name = 'Sales');

- Multi-row subquery:

     SELECT name
FROM employees
WHERE department_id IN (SELECT id FROM departments WHERE region = 'North');

- Correlated subquery:

     SELECT e.name
FROM employees e
WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id);


Go though SQL Learning Series to refresh your basics

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
SQL INTERVIEW PREPARATION PART-7

Explain the difference between GROUP BY and ORDER BY in SQL.

- GROUP BY: Groups rows that have the same values into summary rows.
- ORDER BY: Sorts the result set in ascending or descending order based on one or more columns.

Tips:
- Mention that GROUP BY is typically used with aggregate functions like COUNT, SUM, AVG, etc., while ORDER BY is used for sorting the result set.
- Provide an example to illustrate the distinction between the two clauses.

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
SQL INTERVIEW PREPARATION PART-8

How do you find the nth highest salary from a table in SQL?

Answer:
You can use the LIMIT clause in combination with the ORDER BY clause to find the nth highest salary.

Example:
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT n-1, 1;

Replace 'n' with the desired rank of the salary.

Tip: Emphasize the importance of using DISTINCT to handle cases where there are duplicate salaries, and ensure the ORDER BY clause is sorting the salaries in descending order to find the nth highest salary.

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
SQL INTERVIEW PREPARATION PART-8

How can you find the second highest salary in a table without using the LIMIT clause?

You can use a subquery to find the maximum salary that is less than the overall maximum salary.

Example:
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Tip: Explain that this approach can be useful when the LIMIT clause is not supported or if you want to demonstrate proficiency in using subqueries.

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
I am planning to parallely start another interview series related to data analytics. What should be the topic?
Anonymous Poll
14%
Excel
28%
Power BI
2%
Alteryx
5%
Tableau
17%
Python
3%
R
23%
Data Analyst Interview (mix of all tools)
8%
Data Science/ ML / AI
SQL INTERVIEW PREPARATION PART-9

What are window functions in SQL and can you provide an example?

Answer:
Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, window functions do not cause rows to become grouped into a single output row.

Example using ROW_NUMBER():
SELECT name, salary, department_id,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as row_num
FROM employees;

In this example, ROW_NUMBER() assigns a unique rank to each row within each department, ordered by salary in descending order.

Tip: Highlight the usefulness of window functions for complex analytics and reporting tasks, where you need to perform calculations across rows while still returning individual rows. Explain other common window functions like RANK(), DENSE_RANK(), and NTILE().

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
Data Analysts
I am planning to parallely start another interview series related to data analytics. What should be the topic?
Glad to see the amazing response. I will start other series parallely with SQL Interview Series very soon :)
SQL INTERVIEW PREPARATION PART-10

Explain what a CTE (Common Table Expression) is and provide an example.

Answer:
A Common Table Expression (CTE) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs are defined using the WITH keyword and can improve the readability and organization of complex queries.

Example:
WITH EmployeeCTE AS (
SELECT department_id, AVG(salary) as avg_salary
FROM employees
GROUP BY department_id
)
SELECT e.name, e.salary, e.department_id, c.avg_salary
FROM employees e
JOIN EmployeeCTE c ON e.department_id = c.department_id
WHERE e.salary > c.avg_salary;

In this example, the CTE EmployeeCTE calculates the average salary per department, which is then used in the main query to find employees earning above the average salary in their department.

Tip: Explain that CTEs can be particularly useful for breaking down complex queries into more manageable parts, improving both readability and maintainability. They also allow for recursive queries, which can be useful in hierarchical data structures.

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
SQL INTERVIEW PREPARATION PART-11

What is a self join in SQL? Provide an example.

A self join is a regular join but the table is joined with itself. This is useful when the data is hierarchical or when you need to compare rows within the same table.

Example:
SELECT e1.name AS Employee, e2.name AS Manager
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.id;

In this example, the employees table is joined with itself to find the manager for each employee.

Tip: Explain that self joins are particularly useful for hierarchical data, such as organizational charts, and emphasize the importance of using table aliases (e.g., e1 and e2) to differentiate between the different instances of the same table.

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
Data Analysts
I am planning to parallely start another interview series related to data analytics. What should be the topic?
Thank you so much for the amazing response again. Since we got maximum votes for Power BI Interview Series, let's start with it.

Power BI Interview Preparation Part-1

1. How can you optimize a Power BI report for performance?

Answer:
Optimizing a Power BI report for performance involves several strategies:
1. Data Modeling:
- Use star schema design to simplify relationships.
- Reduce the number of columns in tables by removing unnecessary data.
- Use numeric keys for relationships instead of text keys.

2. Query Optimization:
- Use Power Query to filter and aggregate data before loading it into Power BI.
- Disable auto-detect relationships to manually manage them.
- Reduce the number of calculated columns and use measures instead.

3. DAX Optimization:
- Use variables to store repeated calculations within DAX expressions.
- Avoid using complex or nested DAX functions that can slow down performance.
- Optimize DAX queries by reducing the number of operations and focusing on simpler calculations.

4. Data Refresh:
- Schedule incremental data refresh instead of full data refresh.
- Ensure efficient data source connectivity, such as using DirectQuery or Import mode appropriately.

5. Visualization Optimization:
- Limit the number of visuals on a single page to reduce the rendering time.
- Use simple visuals and avoid using too many custom visuals.
- Enable the "Optimize for Power BI" option in the settings.

Tips:
- Use the Performance Analyzer tool in Power BI Desktop to identify and diagnose performance bottlenecks.
- Regularly review and refine the data model and DAX formulas as the dataset grows and changes.
- Monitor the report's performance after deployment and make adjustments as necessary based on user feedback and performance metrics.

You can refer these Power BI Resources to learn more

Like this post if you want me to continue this Power BI series πŸ‘β™₯️

Share with credits: https://hottg.com/sqlspecialist

Hope it helps :)
SQL INTERVIEW PREPARATION PART-12

How can you detect and delete duplicate rows in a SQL table?

Answer:
To detect and delete duplicate rows, you can use a Common Table Expression (CTE) along with the ROW_NUMBER() window function to identify duplicates and then delete them.

Example to detect duplicates:
WITH DuplicateCTE AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY column1, column2, ... ORDER BY (SELECT NULL)) as row_num
FROM
table_name
)
SELECT * FROM DuplicateCTE WHERE row_num > 1;

Example to delete duplicates:
WITH DuplicateCTE AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY column1, column2, ... ORDER BY (SELECT NULL)) as row_num
FROM
table_name
)
DELETE FROM DuplicateCTE WHERE row_num > 1;

In these examples, replace column1, column2, ... with the columns that define a duplicate.

Tip: Emphasize the importance of carefully choosing the columns in the PARTITION BY clause to accurately identify duplicates. Also, mention that the ORDER BY (SELECT NULL) is used to avoid influencing the row numbering, but a specific column can be used if a specific order is needed.

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
Power BI Interview Preparation Part-2

2. What are the differences between DirectQuery and Import mode in Power BI, and when should you use each?

Answer:

Import Mode:
- Data Storage: Data is imported and stored in the Power BI Desktop file.
- Performance: Generally faster performance because data is preloaded into memory.
- Refresh Frequency: Requires scheduled data refreshes to keep data up-to-date.
- Data Volume: Can handle larger datasets up to the Power BI dataset size limit.
- Latency: Low latency since data is already loaded into Power BI.
- When to Use: Ideal for static or infrequently updated data, when performance is critical, and when handling large datasets.

DirectQuery Mode:
- Data Storage: Data remains in the source system; only metadata is imported into Power BI.
- Performance: Dependent on the performance of the underlying data source.
- Refresh Frequency: Real-time or near-real-time data access, no need for scheduled refreshes.
- Data Volume: Limited by the performance of the underlying data source, but can handle very large datasets.
- Latency: Higher latency since each query is sent to the data source in real-time.
- When to Use: Ideal for real-time data analysis, when data changes frequently, and when dealing with data sources that cannot be fully imported due to size or security constraints.

Tips:
- Consider using a composite model to leverage the benefits of both Import and DirectQuery modes within the same report.
- Evaluate the performance of the underlying data source when using DirectQuery to ensure it can handle the expected query load.
- Optimize queries and data models in DirectQuery mode to minimize performance issues.

Like this post if you want me to continue this Power BI Interview series πŸ‘β™₯️

Share with credits: https://hottg.com/sqlspecialist

Hope it helps :)
SQL INTERVIEW PREPARATION PART-13

Explain the difference between DELETE and TRUNCATE commands in SQL.

Answer:
- DELETE: This command is used to delete rows from a table based on a condition. It is a DML (Data Manipulation Language) operation. Each row deletion is logged individually, and you can use a WHERE clause to specify which rows to delete. It can be rolled back if used within a transaction.

  DELETE FROM employees WHERE department_id = 10;

- TRUNCATE: This command is used to delete all rows from a table. It is a DDL (Data Definition Language) operation. It is faster than DELETE because it deallocates the data pages rather than logging individual row deletions. It cannot be rolled back in some databases because it does not log individual row deletions.

  TRUNCATE TABLE employees;

Tip: Mention that DELETE can be more flexible when you need to remove specific rows, while TRUNCATE is more efficient for removing all rows. Also, point out that TRUNCATE resets any auto-increment counters on the table and may require higher privileges compared to DELETE.

Share with credits: https://hottg.com/sqlspecialist

Like this post if you want me to continue SQL Interview Preparation Series πŸ‘β€οΈ

Hope it helps :)
Power BI Interview Preparation Part-3

3. What is a calculated column, and how does it differ from a measure in Power BI?

Answer:

Calculated Column:
- Definition: A calculated column is a new column that you add to your data model. It is computed using a DAX (Data Analysis Expressions) formula and is stored in the data model.
- Calculation Context: Calculated once when the data is loaded or refreshed and stored in the table.
- Use Case: Best for row-level calculations, such as creating a new column based on existing columns in the same row.
- Example: Adding a column for "Total Sales" by multiplying "Quantity" by "Unit Price" for each row.


  Total Sales = Sales[Quantity] * Sales[Unit Price]

Measure:
- Definition: A measure is a dynamic calculation that is computed at query time. It is also defined using DAX but is not stored in the data model; instead, it is recalculated as needed.
- Calculation Context: Computed dynamically based on the filter context of the report visuals (e.g., slicers, rows, columns).
- Use Case: Best for aggregate-level calculations, such as sums, averages, or more complex aggregations across the entire dataset.
- Example: Calculating the total sales for the filtered context in a report visual.


  Total Sales = SUM(Sales[Quantity] * Sales[Unit Price])

Key Differences:
- Storage: Calculated columns are stored in the data model, whereas measures are computed on the fly.
- Context: Calculated columns are static once created and independent of the visual context, while measures are dynamic and depend on the context of the report visual.
- Performance: Calculated columns can increase the data model size, potentially impacting performance. Measures do not affect the data model size but can impact performance if the calculations are complex and the data is large.

Tips:
- Use calculated columns sparingly to avoid unnecessary increases in model size.
- Prefer measures for calculations that need to be dynamic and context-aware.
- Regularly review your data model to ensure optimal use of calculated columns and measures.

You can refer these Power BI Resources to learn more

Like this post if you want me to continue this Power BI series πŸ‘β™₯️

Share with credits: https://hottg.com/sqlspecialist

Hope it helps :)
HTML Embed Code:
2024/06/15 17:27:52
Back to Top