Slow Reports? How to improve Semantic Model Performance in Microsoft Fabric
Table of Contents
- 1. Data Modeling and Access
- 2. Reducing Memory Footprint
- Remove unused columns and tables
- Drop rows without data
- Filter out needless history
- Reduce high cardinality columns pt.1 – Split or remove
- Reduce high cardinality columns pt.2 – Replace part of the TransactionKey
- Optimize data types pt. 1 – Correct data types
- Optimize data types pt.2 – Split large decimals
- Disable Auto Date/Time
- 3. Power Query and Refresh Strategy
- 4. DAX & Query Optimisation
- Performance Analyzer
- Use Variables (VAR) Extensively
- Context Transition and Iterators
- Prefer Direct Predicates in CALCULATE
- Optimize FILTER for Complex Conditions
- Avoid Forcing Non-Blank Outputs
- Use SUMMARIZECOLUMNS for Grouping and Aggregation
- Push Conditional Logic Outside of Iterators
- AI-Assisted Tuning: MCP DAX Performance Tuner
- 5. Optimizing the Data Layout
- 6. Automated Validation and Tools
- Summary
In the Microsoft Fabric ecosystem a well-optimized semantic model is no longer just ‘nice to have’ it is a necessity for staying within the limits imposed by selected Fabric Capacity (you can find these limits here). Furthermore, if your model is poorly designed, bloated with unnecessary data, or contains inefficient DAX measures written by inexperienced analysts, it will directly impact the consumption of Capacity Units. As we know, CUs are our currency, and we must conserve them in every way possible.
Improving your model influences three critical areas:
- Power BI Reports speed: Instant report interactions for end-users.
- Data loading time: Faster refresh cycles in Import mode, faster responses in Direct Lake
- Fabric Capacity consumption: Efficient semantic models reduce your overall compute usage freeing resources for other workloads.
In the article below, I have compiled a list of both well-known and lesser-known ways to optimise the performance of Semantic Models, taking into account the latest approach using Direct Lake. I hope you find some of this advice useful.
1. Data Modeling and Access
Model using Star Schema approach
Every Fabric/Data Analyst should know this recommendation, but it cannot be omitted here. A star schema remains the gold standard. Of course, it is not always possible to maintain a perfect star schema so any deviations should be treated with particular understanding.
- Recommendation: Center your model around Fact tables (transactions) surrounded by Dimension tables (context).
- Why:
- Dimensions compress well.
- Fact tables remain lean.
- Filtering is optimized for speed.
- Recommended Reading: While the technology evolves, the foundational concepts remain valid. To expand your knowledge and deep dive the into the topic , I recommend “The Data Warehouse Toolkit” by Ralph Kimball or “Star Schema: The Complete Reference” by Christopher Adamson. Although these books are not the most recent, they still provide a solid foundation for building efficient warehouses and semantic models.

Keep relationships simple
Relationships dictate how filters propagate from one table to another. While the engine can handle bi-directional or many-to-many relationships doing so forces it to perform expensive operations and path resolutions on the fly. Complex paths create ambiguity, often resulting in slower queries and unpredictable results.
- Recommendation: Use one-to-many (1:*) relationships with single direction filters.
- Example: Switch a bi-directional filter between
CustomerandSalesto a single direction to prevent high CPU usage during filter propagation. - Example: Change many-to-many (*:*) relationship between
StoreandSalesto one-to-many. If it is not possible, try to use “bridging table” – here you can find more details about this technique.

Simple Row-Level Security (RLS)
RLS is a feature that restricts data access for filtered rows based on a user’s specific role. This allows a single report to dynamically display only the data logged user is authorized to see.
- Recommendation: Keep RLS logic simple as
Salesperson[Email] = USERPRINCIPALNAME()Avoid complex DAX functions likeLOOKUPVALUEor nested security bridges inside your role definitions.
RLS can cause a “fallback” from Direct Lake SQL endpoints mode to DirectQuery, significantly slowing down performance. It is better to use Direct Lake on OneLake in this case.
Query Caching
For reports with high traffic, enable Query Caching in the semantic model settings. This instructs the capacity to store the results of frequent queries so they don’t have to be re-calculated every time a user opens the dashboard.
- Where to find: ellipsis on Semantic Model ⮕ Settings ⮕ Query Caching
- More in the documentation here

Enable Query Scale-Out
This creates read-only replicas of your model. Instead of all 1,000 users hitting one Primary model, the load is distributed across replicas, ensuring that a refresh on the Primary doesn’t slow down the users’ reports.
- Where to find: ellipsis on Semantic Model ⮕ Settings ⮕ Query Scale Out
- More in the documentation here

2. Reducing Memory Footprint
Remove unused columns and tables
Every extra column comes with a memory price tag. If nobody looks at it, drop it.
- Example:
CustomerMiddleNameor oldSales_v2_Testtables should be deleted as nobody needs this column or table.


Drop rows without data
Do not store empty space in your Semantic Model.
- Recommendation: Drop rows where the key metrics are
0ornull(e.g., SalesAmount = 0; unless explicitly needed for logic) in your fact table. Loading millions of rows of “zero sales” wastes memory and processing power during aggregations. - Why: Rows with
value = 0do not affect results but they:- Increase row count
- Hurt compression
Filter out needless history
Loading 15 years of data “just in case” bloats partitions, causes the semantic model to grow unnecessarily and drags down refreshes. This is a key business discussion: How much history do users actually need?
- Recommendation: Keep only the last 3-5 years of Sales. Skip orders with status = “Cancelled” if they never make it to dashboards anyway.
Reduce high cardinality columns pt.1 – Split or remove
Cardinality refers to the number of unique values in a column. Columns with millions of unique values (like Fact table primary keys, IDs, GUIDs, Timestamps) destroy the VertiPaq engine’s ability to compress data.
- Recommendation:
- Split
DateTimecolumns into two separate columns:Date(low cardinality) andTime(higher cardinality, but better than combined). - Remove columns like Sales
TransactionKeyorTransactionIDif they are not used for relationships or in your visualizations.
- Split
The best tool for checking high cardinality columns is DAX Studio.

Reduce high cardinality columns pt.2 – Replace part of the TransactionKey
I’ve seen stakeholders push for keeping 5+ years of history into Semantic Model. The problem? Columns like SalesTransactionKey have high cardinality and they’ll consume your memory and reduce performance really fast.
- Recommendation: Identify how far back business users analyze data in Power BI (e.g., the last 24 months). For records older than that, replace the unique Key with a single value like
-1or"Historical". - Why: The column has better compression which reduce memory usage by the engine while still keeping the historical rows for later usage or aggregations.

Optimize data types pt. 1 – Correct data types
Depending on the type of data you select, the engine will allocate the appropriate amount of memory for its later computation.
- Recommendation:
- Change numeric keys (like
ProductID) from Text to Whole Number if possible. Integers compress much better than strings. - Use Fixed Decimal (Currency) rather than floating-point
Decimalfor financial values. It uses less memory and guarantees 4-decimal precision.
- Change numeric keys (like
Optimize data types pt.2 – Split large decimals
Large decimal numbers or high-precision currency values don’t compress well.
- Recommendation: If you have a column with massive distinct values (like a raw scientific measurement or high-precision crypto value), split it into three separate Whole Number columns: Millions, Thousands, and Ones.
- Why: Three integer columns often compress significantly better than one high-cardinality decimal column. The slight CPU overhead of summing them back together at query time is outweighed by the scan speed gained from compression. Not universal, but worth testing.
Disable Auto Date/Time
This feature creates a hidden internal calendar table for every date column in every table of your model.
- Recommendation: Turn this off in Power BI ⮕ Options and settings ⮕ Options ⮕ Data Load and use one central
DimDatetable. - Why: This needlessly bloats your memory footprint, as a single fact table with five date columns will force the engine to materialize five separate, redundant calendar tables behind the scenes.

3. Power Query and Refresh Strategy
Push transformations upstream (Roche’s Maxim)
Heavy transformations inside Power Query consume significant Fabric Capacity during refresh and can slow down development.
- Recommendation: Avoid creating complex transformations in Power Query. Instead, perform data cleansing and shaping as close to the data source as possible (e.g., in the SQL View, Stored Procedure, or Data Warehouse pipeline) before the data ever reaches Power BI.
In this context “upstream” means closer to where the data is originally produced, and “downstream” means closer to where the data is consumed.
Preserve Query Folding
Query Folding is when Power BI pushes the work back to the source database (like SQL or Fabric Warehouse) and translates Applied Step into native query of the source like T-SQL. If folding breaks, Power BI has to pull all raw data into its own memory to transform it, which spikes CU usage and slows down refreshes.
- Recommendation:
- Always verify Query Folding by checking that View Native Query is enabled (not greyed out) in the Power Query Editor.
- If a step breaks folding, move it later in the Applied Steps sequence to keep more operations pushed to the source database.
- More in the documentation you can find here
- In DataFlow Gen2 you can find more indicators of Query Folding, check them here

Switch to Direct Lake on OneLake or Import
…instead of Direct Query
- Recommendation: DirectQuery sends every query back to the source database, creating high latency and spiking CU usage with each user interaction. Direct Lake reads data directly from OneLake parquet files (no import or refresh needed), delivering almost the speed of the Import with minimal overhead. If Direct Lake is not available, use Import mode.
- More information about the differences between Direct Lake on OneLake and SQL Endpoit here
When using Direct Lake method, connect to OneLake (not the SQL Endpoint). Direct Lake on SQL Endpoint is prone to fallback scenarios (especially with RLS or complex queries) that silently degrade performance to DirectQuery speeds. Direct Lake on OneLake is the most stable and performant option.

Implement Incremental Refresh
If you are using Import mode, you don’t need to reload all the data from last 4 years each morning.
- Recommendation: Configure Incremental Refresh feature to keep history (e.g. last 2 years) but only refresh the data that has actually changed (e.g. last 1 day). This approach reduces the usage of Fabric Capacity, and time needed to load the Semantic Model.

Utilize partition refresh
In addition to incremental refresh, it is possible to selectively refresh the Semantic Model to avoid reloading all data.
- Recommendation Use the XMLA endpoint or the Power BI REST API to refresh specific tables or specific partitions. This is particularly useful for large models where full refreshes consume excessive Capacity Units or time. Remember to enable
ReadWriteoption for XMLA Endpoint in Admin Portal ⮕ Capacity Settings ⮕ Power BI Workloads


4. DAX & Query Optimisation
A bit of theory to start with.
Semantic model performance is not just about data storage and star schema. DAX measure design is often the real bottleneck. Every query is executed by two cooperating engines:
Formula Engine (FE) is flexible but expensive. Supports iterators (SUMX, AVERAGEX, COUNTX, RANKX), row-by-row calculations, and operations on decompressed data.
- Characteristics:
- Single-threaded (can only use one CPU core).
- CPU-heavy.
- Slower at scale.
Storage Engine (SE) is fast and parallel. Handles column scans, simple aggregations (SUM, COUNT), and filtering on compressed data.
- Characteristics:
- Multi-threaded (uses all available cores).
- Highly optimized for compression.
- Extremely fast for large datasets.
Push Work Into the Storage Engine – the fastest measures are those where the Storage Engine does the heavy lifting (scans + aggregation) and the Formula Engine only combines the results. Throughout the following chapter, you will find references to these two engines, so it’s worth bearing this in mind.
Maximize: Storage Engine usage.
Minimize: Formula Engine time and the number of “callbacks” (FE ↔ SE back-and-forth).
Performance Analyzer
Your starting point before changing any DAX measure, use the Performance Analyzer in Power BI Desktop. It shows you exactly which visual is slow and whether the bottleneck is the DAX query itself.

Use Variables (VAR) Extensively
Variables are evaluated once. Using them prevents the engine from calculating the same expression multiple times.
Example:
Profit Margin Prc =
VAR _TotalRevenue = SUM(Sales[TotalSales])
VAR _TotalCost = SUM(Sales[Cost])
VAR _Profit = _TotalRevenue - _TotalCost
RETURN
IF(
_TotalRevenue = 0,
BLANK(),
DIVIDE(_Profit, _TotalRevenue)
)
Context Transition and Iterators
Iterators (SUMX, AVERAGEX, COUNTX, FILTER, ADDCOLUMNS, etc.) calculate a formula by stepping through a table row-by-row. Their performance depends entirely on how complex that row-by-row formula is.
- Recommendation:
- Replace
SUMX(T, T[Col])withSUM(T[Col])they have identical result but lower overhead. - If your iterator contains
CALCULATEor a measure reference, try to restructure. Apply the filter once before the iteration starts, then let the iterator do only simple math. Filtering once is far cheaper than re-evaluating the filter on every row. - Reduce the cardinality of the iterated table. Use
SUMMARIZE,GROUPBY, orVALUESon a dimension column to iterate over groups rather than individual fact rows. - In DirectQuery mode, treat every iterator with context transition as a red flag. Each Storage Engine request becomes a SQL query over the wire.
- Replace
When iterators are cheap: Simple arithmetic likeSUMX(Sales, Sales[Quantity] * Sales[Price]) is fast even on large tables. The engine pulls the needed columns into memory in a single pass and runs through the math efficiently. That said, native aggregators like SUM, AVERAGE, and COUNT are still faster because the engine can resolve them without any row-by-row evaluation at all. So if you’re just aggregating a single column, always prefer SUM(Sales[Amount]) over SUMX(Sales, Sales[Amount]). But don’t fear iterators when you genuinely need to combine multiple columns in an expression – that’s what they’re built for and they handle it quite well.
Total Revenue =
SUMX(
Sales,
Sales[Quantity] * Sales[UnitPrice] * (1 - Sales[DiscountPct])
)
When iterators are expensive: Iterators become dangerously slow if the formula inside them uses CALCULATE or references an existing measure. Doing this causes a Context Transition. This means the system has to stop, convert the current row into a filter, and run a fresh, standalone calculation for every single row. If you do this on a table with millions of rows, the system gets bottlenecked trying to process millions of separate mini-calculations one at a time.
Sales With Discount =
SUMX(
Sales,
CALCULATE(
SUM(Sales[Amount]),
Discounts[Category] = EARLIER(Sales[Category])
)
)
Prefer Direct Predicates in CALCULATE
When filtering by a simple column-value comparison in CALCULATE, use a direct predicate rather than FILTER over the entire fact table.
- Suboptimal (Bad):
FILTER('Sales', ...)scans all rows of the fact table to produce a filtered result. On large tables (millions of rows), this forces the Storage Engine to do an extra scan to materialize the full filtered table before the aggregation can even start. The Formula Engine then has to perform an expensive row-matching join to connect the filtered rows back to the aggregation adding unnecessary overhead which is easily avoidable.
Sales USD Bad =
CALCULATE(
Sales[Sales Amount],
FILTER(
'Sales',
Sales[Currency Code] = "USD"
)
)
- Preferred (Good): Internally, this is equivalent to
FILTER(ALL(Sales[Currency Code]), Sales[Currency Code] = "USD"), which iterates only the distinct values of the Currency Code column (the VertiPaq dictionary). If Currency Code has 5 distinct values and the Sales table has 100 million rows, the engine iterates 5 values instead of 100 million and does so in a single Storage Engine scan without the extra materialization step.
Sales USD Good =
CALCULATE(
Sales[Sales Amount],
Sales[Currency Code] = "USD"
)
Measured Performance Impact
| Metric | Preferred (Direct Predicate) | Suboptimal (FILTER on Fact) | Delta |
|---|---|---|---|
| Total Duration | 56,031 ms | 68,753 ms | +23% slower |
| FE Duration | 54,080 ms | 64,929 ms | +10,849 ms |
| SE Duration | 1,951 ms | 3,824 ms | ~2× slower |
| SE Queries | 1 | 2 | Extra scan |
| SE Data (KB) | 77,924 | 77,924 + 38,962 | +50% more data |
The following results were captured in DAX Studio using Server Timings against a Sales fact table, with both patterns evaluated across a high-cardinality grouping (SUMMARIZECOLUMNS by CustomerKey × Date).


The suboptimal version issued 2 Storage Engine scans instead of 1. The second scan materialized the FILTER result over the full fact table to determine which rows match the condition, then required additional Formula Engine coordination to join that result back into the aggregation.
Important: On a single scalar evaluation (e.g.,
ROW("Result", [Measure])), both patterns may produce identical query plans due to VertiPaq predicate folding. The performance gap only becomes measurable when the measure is evaluated thousands of times which is exactly what happens in a real report visual with multiple dimension groupings.
- When to use FILTER: Reserve
FILTERfor conditions that cannot be expressed as direct predicates:- Comparing two columns from the same row e.g.,
FILTER('Sales', Sales[Delivery Date] > Sales[Order Date]) - Evaluating a measure within the filter condition e.g.,
FILTER(VALUES('Product'[ProductKey]), [Total Sales] > 1000) - Complex Boolean logic that cannot be decomposed into independent column predicates
- Comparing two columns from the same row e.g.,
For these complex scenarios, see Optimize FILTER for Complex Conditions below.
Optimize FILTER for Complex Conditions
This section of the recommendations is considerably more detailed, so I would advise you to take a little extra time to go through it.
- Recommendation: When FILTER is required for complex logic that cannot be expressed as a direct predicate (e.g., comparing two columns, evaluating measures), the table you pass as FILTER’s first argument has a significant impact on both performance and correctness. Apply two optimization levers:
- Reduce the rows FILTER must iterate. Use
ALLorALLSELECTEDto iterate distinct column combinations instead of the full fact table. - Keep simple conditions outside FILTER. If part of your logic can be written as a direct predicate, don’t bury it inside
FILTER. Pull it out as a separateCALCULATEargument. Only useFILTERfor the part that truly requires it.
- Reduce the rows FILTER must iterate. Use
- Why complex FILTER is expensive: Simple conditions like
Sales[Category] = "Electronics"are fast because the Storage Engine can resolve them internally using its compressed column indexes and no row-by-row work is needed.
Cross-column comparisons likeSales[DeliveryDate] > Sales[OrderDate]are different. The Storage Engine can’t resolve these from its indexes, so it falls back to the slower Formula Engine checking each row individually. On a multi-million row table, that row-by-row evaluation is the main cost.
But row count isn’t the only problem. When you writeFILTER(Sales, ...), the result carries a reference to every column in the Sales table. This forces the Formula Engine to do extra work matching the filtered rows back to the full table during aggregation. Essentially an expensive join behind the scenes.
UsingALL(Sales[Col1], Sales[Col2])instead solves both problems: it only references the columns you actually need (eliminating the extra join), and it returns only distinct combinations (drastically reducing row count). If your Sales table has 15 million rows but only 500 unique price/cost combinations, the Formula Engine evaluates 500 rows instead of 15 million.
Below you will find examples, along with the times they were generated, which support the above recommendation.
Use ALL or ALLSELECTED to Iterate Distinct Combinations
Example 1: Cross-column comparison only
- Suboptimal (Bad) – FILTER over the full fact table: Iterates every row in Sales to evaluate the price condition, even though Unit Price and Unit Cost may repeat across thousands of rows with the same product.
High Value Products Bad =
CALCULATE (
SUM ( Sales[Sales Amount] ),
FILTER ( 'Sales', Sales[Unit Price] > Sales[Unit Cost] )
)
- Preferred (Good) – FILTER over distinct combinations with ALL:
ALL( Sales[Unit Price], Sales[Unit Cost] )produces the distinct combinations of those two columns – far fewer rows than the full fact table. Because the columns come directly from the Sales table, lineage is preserved andCALCULATEcan apply the filter correctly..
High Value Products Good =
CALCULATE (
SUM ( Sales[Sales Amount] ),
FILTER (
ALL ( Sales[Unit Price], Sales[Unit Cost] ),
Sales[Unit Price] > Sales[Unit Cost]
)
)
Measured Performance Impact
| Metric | Preferred (ALL) | Suboptimal (Full Table) | Delta |
|---|---|---|---|
| Total Duration | 95,297 ms | 145,309 ms | ~34% faster |
| FE Duration | 92,207 ms | 133,364 ms | −41,157 ms |
| SE Duration | 3,090 ms | 11,945 ms | ~74% faster |
| SE Queries | 1 | 2 | Extra scan eliminated |
| SE Data (KB) | 131,066 | 371,696 (across 2 scans) | ~65% less data |
| Physical Plan | 4 lines (clean) | 8 lines (CrossApply + Spool) |
These results were captured in DAX Studio using Server Timings and Query Plan against the Sales fact table, with both patterns evaluated across SUMMARIZECOLUMNS by CustomerKey × Date.


In testing, the suboptimal version scanned all rows of the fact table to evaluate the cross-column comparison. The Formula Engine had to match every filtered row back to the full table. The optimized version skips that entirely, iterating only over the distinct column combinations instead of every row.
Example 2: Cross-column comparison + simple equality
When you also need a simple equality condition alongside a cross-column comparison, layer both optimization levers: ALL for the complex condition and a direct predicate for the simple one.
- Suboptimal (Bad) – All conditions inside FILTER on the full fact table: Even though the Storage Engine can handle the simple equality efficiently, the cross-column comparison still forces the Formula Engine to check every row in the full fact table one by one.
Late Deliveries USD Bad =
CALCULATE (
SUM ( Sales[Sales Amount] ),
FILTER (
'Sales',
Sales[Delivery Date] > Sales[Order Date] && Sales[Currency Code] = "USD"
)
)
- Preferred (Good) – Direct predicate + ALL for the complex condition: The simple equality
Currency Code = "USD"stays as a direct predicate. The Storage Engine resolves it instantly using its compressed indexes. The cross-column comparison usesALLto iterate only the distinctDelivery Date × Order Datecombinations instead of every row.CALCULATEthen combines both filters independently.
Late Deliveries USD Good =
CALCULATE (
SUM ( Sales[Sales Amount] ),
Sales[Currency Code] = "USD",
FILTER (
ALL ( Sales[Delivery Date], Sales[Order Date] ),
Sales[Delivery Date] > Sales[Order Date]
)
)
Measured Performance Impact
| Metric | Preferred (Direct Predicate + ALL) | Suboptimal (All in FILTER) | Delta |
|---|---|---|---|
| Total Duration | 20,810 ms | 25,506 ms | ~18% faster |
| FE Duration | 20,090 ms | 24,105 ms | −4,015 ms |
| SE Duration | 720 ms | 1,401 ms | ~49% faster |
| SE Queries | 1 | 2 | Extra scan eliminated |
| SE Data (KB) | 31,751 | 31,788 + 15,894 | ~33% less data |
| Physical Plan | 4 lines (clean) | 8 lines (CrossApply + Spool) |
These results were captured in DAX Studio using Server Timings and Query Plan against the Sales fact table, with both patterns evaluated across SUMMARIZECOLUMNS by CustomerKey × Date


The optimized version runs a single scan through the Storage Engine. The suboptimal version, despite handling the currency filter efficiently, still forces the Formula Engine to process additional 2 million rows for the cross-column comparison and perform an expensive row-matching join behind the scenes.
ALL vs ALLSELECTED – When to use which?
ALL ignores all external filters like slicers, visual-level filters, everything. ALLSELECTED returns distinct combinations while respecting slicer and visual-level filters. In most report measures, you need to respect user selections, which makes ALLSELECTED the safer default.
A note on ALLSELECTED performance
You may have seen warnings thatALLSELECTEDcauses performance problems. Those concerns apply to patterns likeFILTER(ALLSELECTED('Sales'), ...), which iterates the full table within a shadow filter context – commonly seen in running-total calculations where every row triggers a context transition and re-evaluation. Our pattern is fundamentally different:ALLSELECTED(Column1, Column2)returns only the distinct combinations of those specific columns, not the full table. There is no per-row context transition, no measure evaluation inside the FILTER – just a simple comparison across a small set of distinct values. The performance characteristics are comparable toALL(Column1, Column2), with the added benefit of respecting external slicer selections.
When ALL Does NOT Work
If your filter condition only compares columns listed in your ALL() call using simple operators, use ALL. If it involves measures, EARLIER, references to other columns, or logic that depends on other rows, iterate the full table instead.
Safe – result depends only on Unit Price and Unit Cost:
FILTER(
ALL ( Sales[Unit Price], Sales[Unit Cost] ),
Sales[Unit Price] > Sales[Unit Cost]
)
Not safe – result depends on other rows (customer’s average):
FILTER(
'Sales',
Sales[Sales Amount] > AVERAGEX( RELATEDTABLE('Sales'), Sales[Sales Amount] )
)
Why Not SELECTCOLUMNS?
You may encounter advice to wrap the iterated table in SELECTCOLUMNS to reduce the number of columns processed. With direct column references, SELECTCOLUMNS does narrow the lineage and eliminates the CrossApply (in query plan) producing the same clean query plan and comparable performance as ALL/ALLSELECTED. However, it carries a unique correctness risk that makes it the wrong choice.
CALCULATE (
SUM ( Sales[Sales Amount] ),
FILTER (
SELECTCOLUMNS (
'Sales',
"Price", Sales[Unit Price],
"Cost", Sales[Unit Cost]
),
[Price] > [Cost]
)
)When SELECTCOLUMNS contains calculated expressions (example below) instead of direct column references, lineage breaks silently. The engine cannot trace a computed column like Sales[Unit Price] - Sales[Unit Cost] back to any model column, so CALCULATE finds no lineage to propagate and discards the filter producing wrong results with no error or warning.
CALCULATE (
SUM ( Sales[Sales Amount] ),
FILTER (
SELECTCOLUMNS (
'Sales',
"Margin", Sales[Unit Price] - Sales[Unit Cost]
),
[Margin] > 100
)
)Testing confirmed this: a FILTER using SELECTCOLUMNS(..., "Margin", Sales[Unit Price] - Sales[Unit Cost]) returned the unfiltered total (23.2B) instead of the correct filtered result (21.1B). The filter was completely ignored. Direct column references like Sales[Quantity] preserved lineage correctly and returned matching results.
Since ALLSELECTED achieves the same performance with no lineage risk, simpler syntax, and clearer intent, there is no scenario where SELECTCOLUMNS is the better choice inside FILTER as a CALCULATE argument.
Avoid Forcing Non-Blank Outputs
A common mistake is forcing a measure to return 0 instead of Blank, or unconditionally appending text (e.g., [Sales] & " USD"). This destroys the engine’s ability to skip empty rows. If you have 1 million customers and only 5 bought something, a normal query processes 5 rows. If you add + 0 or USD, the engine forces a result for all 1 million customers, increasing memory and CPU usage.
- Recommendation: Wrap your logic to check for existence first:
IF( NOT ISBLANK([Sales Amount]), [Sales Amount] & " USD", BLANK() ) - Why: Developers do this to force visuals (like a Matrix) to display
0or “USD” instead of empty cells, often for aesthetics to indicate that a region had no sales.
Use SUMMARIZECOLUMNS for Grouping and Aggregation
SUMMARIZECOLUMNS is the primary function Power BI uses internally to populate visuals, and it should be your default for any grouping and aggregation in DAX.
- Recommendation: Use
SUMMARIZECOLUMNSas your default function for grouping and aggregation in DAX queries, variables, table expressions, and measures.SUMMARIZECOLUMNSfully supports context transitions, meaning it works inside CALCULATE, CALCULATETABLE, iterators, and measures. Try to avoid SUMMARIZE’s own expression parameters to compute aggregations, this approach is deprecated. - Why:
- SUMMARIZECOLUMNS is just faster. It is the same function Power BI generates internally for visuals, so Microsoft has heavily optimized its query plans. It pushes filters into Storage Engine queries, consolidates scans, and automatically removes blank rows – all of which reduce materialization overhead compared to SUMMARIZE + ADDCOLUMNS.
- SUMMARIZE’s expression columns are deprecated and generate inefficient query plans. While modern engines may still return correct results,
SUMMARIZEbuilds worse plans when expressions are used. It pulls in unnecessary columns, creates intermediate groupings, and materializes far more rows than needed.
Preferred – SUMMARIZECOLUMNS
EVALUATE
SUMMARIZECOLUMNS (
'Product'[Category],
'Date'[Calendar Year],
"Total Sales", [Total Sales],
"Total Cost", [Total Cost]
)
Preferred – SUMMARIZE + ADDCOLUMNS (still works, but no longer required as a fallback)
EVALUATE
ADDCOLUMNS (
SUMMARIZE ( Sales, 'Product'[Category], 'Date'[Calendar Year] ),
"Total Sales", [Total Sales],
"Total Cost", [Total Cost]
)
Avoid – SUMMARIZE with direct expressions
EVALUATE
SUMMARIZE (
Sales,
'Product'[Category],
"Total Sales", [Total Sales]
)Performance Comparison
Tested on sample model – grouping by Product Category with Sales Amount and Average Subcategory Sales. All three approaches returned identical results, but the cost of computing them was very different.
| Metric | SUMMARIZECOLUMNS | ADDCOLUMNS + SUMMARIZE | SUMMARIZE (expressions) |
|---|---|---|---|
| Total time | 157 ms | 230 ms | 391 ms |
| SE queries | 2 | 3 | 4 |
| SE CPU | 1,094 ms | 1,845 ms | 3,641 ms |
| Max rows materialized | 32 | 32 | 2,517 |
| Materialization passes | 1 | 1 | 2 |



Push Conditional Logic Outside of Iterators
Using IF or SWITCH inside row-by-row iterator functions forces the engine to evaluate the condition for every single row, adding significant overhead.
- Bad:
Premium Orders = SUMX(Orders, IF(Orders[Total] > 500, Orders[Total] * Orders[Discount]))— TheIFcondition is evaluated millions of times if your Orders table is large. - Good:
Premium Orders = CALCULATE([Total Discounted], Orders[Total] > 500)— The filter is applied once at the storage engine level before aggregation, dramatically reducing the number of operations.
Whenever possible, convert row-level conditional logic into CALCULATE filters. This shifts the work from the formula engine (slow, row-by-row) to the storage engine (fast, optimized for bulk operations).
AI-Assisted Tuning: MCP DAX Performance Tuner
For a systematic, research-driven approach to optimization, use the MCP DAX Performance Tuner. This tool utilizes the Model Context Protocol (MCP) to give Large Language Models (LLMs) direct access to your model to perform deep optimization.
- Deep Performance Analysis: It runs traces to capture detailed server timings, breaking down bottlenecks between the Formula Engine and Storage Engine.
- Semantic Validation: Unlike generic AI suggestions, this tool validates its own work by running the new query and comparing row counts and sample records to ensure the results match your baseline exactly.
- Smart Discovery: It automatically detects running Power BI Desktop instances or Service workspaces to begin tuning immediately.
5. Optimizing the Data Layout
In Microsoft Fabric, Direct Lake reads Parquet (delta) files as they are from OneLake, unlike Import mode, where the engine reshapes data during refresh. This makes the physical file layout a performance concern we need to tackle.
Enable V-Order on gold layer tables
V-Order is a write-time optimization applied to parquet files that sorts data and enables dictionary encoding. Microsoft reports that V-Order is critical for Direct Lake, providing a 40-60% performance improvement for cold-cache queries.
- Fabric Data Warehouse (T-SQL): Zero maintenance required. If you are writing data using a Fabric Warehouse, system-managed optimization automatically applies V-Order, handles auto-compaction in the background, and manages checkpoints. The output is inherently optimized for Power BI consumption.
- Fabric Lakehouse (Spark/Notebooks): While you want V-Order for your semantic models, applying it everywhere is a mistake. V-Order adds a 15-33% overhead to write times. The recommended strategy is to disable V-Order (if its turned on) for your Bronze and Silver medallion layers (where fast ingestion and transformation matter most) and explicitly enable it only when writing your Gold layer tables that Direct Lake will consume.
ALTER TABLE schema_name.gold_table
SET TBLPROPERTIES ('delta.parquet.vorder.enabled' = 'true')
Right-Size your Parquet files
Direct Lake performance is sensitive to file layout. Too many small files means excessive overhead opening and transcoding each one. Too few oversized files limits the engine’s ability to parallelize reads. The target for Direct Lake consumption is files between 400 MB and 1 GB, with 8 million or more rows per row group.
There are two complementary strategies to achieve this — prevent fragmentation at write time, and fix it after the fact:
At write time, prevent fragmentation:
spark.conf.set('spark.databricks.delta.optimizeWrite.enabled', 'true')
spark.conf.set('spark.databricks.delta.autoCompact.enabled', 'true')
After writes, compact and apply V-Order in one pass:
OPTIMIZE schema_name.gold_table VORDER
Remember about the cleanup, run VACCUM periodically to remove old unreferenced files:
VACUUM schema_name.gold_table RETAIN 168 HOURS
Organizing Data for Filter Performance
Beyond file size, how data is organized within and across files determines whether the engine can skip irrelevant data during queries. If a user filters by Region = 'Europe' and all regions are scattered randomly across every file, Direct Lake must scan everything. If the data is organized so that all European rows live in the same files, the engine skips the rest entirely saving significant compute.
There are two approaches:
Traditional partitioning
(partitionBy) physically separates data into subdirectories by a chosen column. This works well when you have a clear, stable, low-cardinality filter column (e.g., Year or Region). The benefit is twofold: the engine skips non-matching partitions entirely, and the partition column within each file compresses perfectly since it contains only one distinct value.
df.write.format("delta") \
.partitionBy("Region") \
.mode("overwrite") \
.save("Tables/gold_sales")The risk with traditional partitioning is getting it wrong. Too many partitions (high-cardinality columns like CustomerID) produce thousands of tiny files — exactly the problem you’re trying to avoid. And once you partition by a column, changing it later requires rewriting the entire table.
Liquid Clustering
is the newer, more flexible alternative that Microsoft now recommends. Instead of rigid directory-based partitions, it organizes data within files using clustering keys that can be changed without rewriting history. Liquid Clustering requires Fabric Spark Runtime 1.3 or later. It adapts to evolving query patterns and works together with OPTIMIZE — each time you run OPTIMIZE, it re-clusters the data incrementally.
CREATE TABLE schema_name.gold_sales (
id INT, region STRING, sale_date DATE, amount DECIMAL
) CLUSTER BY (region)
If your table already exists and is unpartitioned, you can enable clustering after the fact (in pySpark this time):
spark.sql("ALTER TABLE schema_name.gold_sales CLUSTER BY (region)")
Note that Liquid Clustering is not compatible with partitioned tables — if you’re migrating from partitionBy, you’ll need to recreate the table without partitions first.
The trade-off compared to traditional partitioning is that Liquid Clustering doesn’t provide the absolute partition-elimination guarantees of physical subdirectories. For most new implementations, however, the flexibility and lower maintenance overhead make it the better default.
6. Automated Validation and Tools
Tabular Editor & BPA
The fastest way to find bottlenecks is the Best Practice Analyzer (BPA) within Tabular Editor. Think of BPA as a lint checker for your data model. It scans your model against a set of proven rules, catching unused objects consuming memory, data types mismatches hurting compression, and inefficient DAX patterns. Run BPA before any manual tuning to get a prioritized list of fixes instead of guessing. What might be useful, you can run BPA during CI/CD process within Azure DevOps pipeline.
- You can find Tabular Editor and BPA here
DAX Studio & VertiPaq Analyzer
DAX Studio is the tool every DAX developer should have installed. The Server Timings feature, used throughout this article to capture all benchmarks, shows exactly how much time is spent in the Formula Engine versus the Storage Engine. The Physical Query Plan reveals whether a measure triggers a clean plan with a single scan or the expensive CrossApply pattern discussed in the FILTER optimization sections.
- You can download DAX Studio from here
VertiPaq Analyzer, integrated into both DAX Studio and Tabular Editor, visualizes memory consumption at the column level. It is the fastest way to identify which columns consume the most RAM and are candidates for removal or cardinality reduction.
- Grab VeriPaq Analyzer here
Measure Killer
Measure Killer scans your semantic model and identifies measures not referenced by any report visual, other measure, or calculated column. Unused measures add complexity and make maintenance harder. Measure Killer provides a clear list of candidates for removal.
- MeasureKiller page click here
Summary
Unoptimized semantic models in Microsoft Fabric are not just slow, they actively burn through Capacity Units and drive up costs. To prevent this, I recommend shifting heavy data transformations upstream to the Lakehouse or Warehouse, keeping your model lean with a strict star schema, and adjusting Parquet files for Direct Lake. I also emphasize writing efficient DAX and using diagnostic tools like DAX Studio and Tabular Editor to locate real bottlenecks.
I know firsthand that this is all much easier said than done. In the real world, as projects scale and data gets messy, maintaining a perfect model is a massive ongoing battle. My hope is that this guide serves as your practical diagnostic toolkit, giving you exactly the right places to start digging when your reports inevitably begin to lag.
