Slow Reports? How to improve Semantic Model Performance in Microsoft Fabric

Semantic Model Performance

Table of Contents

 

In the Microsoft Fabric ecosystem a well-optimized semantic model is no longer just ‘nice to have’ it is a necessity for staying within the limits imposed by selected Fabric Capacity (you can find these limits here). Furthermore, if your model is poorly designed, bloated with unnecessary data, or contains inefficient DAX measures written by inexperienced analysts, it will directly impact the consumption of Capacity Units. As we know, CUs are our currency, and we must conserve them in every way possible.

Improving your model influences three critical areas:

  • Power BI Reports speed: Instant report interactions for end-users.
  • Data loading time: Faster refresh cycles in Import mode, faster responses in Direct Lake
  • Fabric Capacity consumption: Efficient semantic models reduce your overall compute usage freeing resources for other workloads.

In the article below, I have compiled a list of both well-known and lesser-known ways to optimise the performance of Semantic Models, taking into account the latest approach using Direct Lake. I hope you find some of this advice useful.

 


 

1. Data Modeling and Access

Model using Star Schema approach

Every Fabric/Data Analyst should know this recommendation, but it cannot be omitted here. A star schema remains the gold standard. Of course, it is not always possible to maintain a perfect star schema so any deviations should be treated with particular understanding.

  • Recommendation: Center your model around Fact tables (transactions) surrounded by Dimension tables (context).
  • Why:
    • Dimensions compress well.
    • Fact tables remain lean.
    • Filtering is optimized for speed.
  • Recommended Reading: While the technology evolves, the foundational concepts remain valid. To expand your knowledge and deep dive the into the topic , I recommend “The Data Warehouse Toolkit” by Ralph Kimball or “Star Schema: The Complete Reference” by Christopher Adamson. Although these books are not the most recent, they still provide a solid foundation for building efficient warehouses and semantic models.

 

Keep relationships simple

Relationships dictate how filters propagate from one table to another. While the engine can handle bi-directional or many-to-many relationships doing so forces it to perform expensive operations and path resolutions on the fly. Complex paths create ambiguity, often resulting in slower queries and unpredictable results.

  • Recommendation: Use one-to-many (1:*) relationships with single direction filters.
  • Example: Switch a bi-directional filter between Customer and Sales to a single direction to prevent high CPU usage during filter propagation.
  • Example: Change many-to-many (*:*) relationship between Store and Sales to one-to-many. If it is not possible, try to use “bridging table” – here you can find more details about this technique.

 

Simple Row-Level Security (RLS)

RLS is a feature that restricts data access for filtered rows based on a user’s specific role. This allows a single report to dynamically display only the data logged user is authorized to see.

  • Recommendation: Keep RLS logic simple as Salesperson[Email] = USERPRINCIPALNAME() Avoid complex DAX functions like LOOKUPVALUE or nested security bridges inside your role definitions.

RLS can cause a “fallback” from Direct Lake SQL endpoints mode to DirectQuery, significantly slowing down performance. It is better to use Direct Lake on OneLake in this case.

 

Query Caching

For reports with high traffic, enable Query Caching in the semantic model settings. This instructs the capacity to store the results of frequent queries so they don’t have to be re-calculated every time a user opens the dashboard.

  • Where to find: ellipsis on Semantic Model ⮕ Settings ⮕ Query Caching
  • More in the documentation here

 

Enable Query Scale-Out

This creates read-only replicas of your model. Instead of all 1,000 users hitting one Primary model, the load is distributed across replicas, ensuring that a refresh on the Primary doesn’t slow down the users’ reports.

  • Where to find: ellipsis on Semantic Model ⮕ Settings ⮕ Query Scale Out
  • More in the documentation here

 


 

2. Reducing Memory Footprint

Remove unused columns and tables

Every extra column comes with a memory price tag. If nobody looks at it, drop it.

  • Example: CustomerMiddleName or old Sales_v2_Test tables should be deleted as nobody needs this column or table.

 

Drop rows without data

Do not store empty space in your Semantic Model.

  • Recommendation: Drop rows where the key metrics are 0 or null (e.g., SalesAmount = 0; unless explicitly needed for logic) in your fact table. Loading millions of rows of “zero sales” wastes memory and processing power during aggregations.
  • Why: Rows with value = 0 do not affect results but they:
    • Increase row count
    • Hurt compression

 

Filter out needless history

Loading 15 years of data “just in case” bloats partitions, causes the semantic model to grow unnecessarily and drags down refreshes. This is a key business discussion: How much history do users actually need?

  • Recommendation: Keep only the last 3-5 years of Sales. Skip orders with status = “Cancelled” if they never make it to dashboards anyway.

 

Reduce high cardinality columns pt.1 – Split or remove

Cardinality refers to the number of unique values in a column. Columns with millions of unique values (like Fact table primary keys, IDs, GUIDs, Timestamps) destroy the VertiPaq engine’s ability to compress data.

  • Recommendation: 
    • Split DateTime columns into two separate columns: Date (low cardinality) and Time (higher cardinality, but better than combined).
    • Remove columns like SalesTransactionKey or TransactionID if they are not used for relationships or in your visualizations.

The best tool for checking high cardinality columns is DAX Studio.

 

Reduce high cardinality columns pt.2 – Replace part of the TransactionKey

I’ve seen stakeholders push for keeping 5+ years of history into Semantic Model. The problem? Columns like SalesTransactionKey have high cardinality and they’ll consume your memory and reduce performance really fast.

  • Recommendation: Identify how far back business users analyze data in Power BI (e.g., the last 24 months). For records older than that, replace the unique Key with a single value like  -1 or "Historical".
  • Why: The column has better compression which reduce memory usage by the engine while still keeping the historical rows for later usage or aggregations.

  

Optimize data types pt. 1 – Correct data types

Depending on the type of data you select, the engine will allocate the appropriate amount of memory for its later computation.

  • Recommendation: 
    • Change numeric keys (like ProductID) from Text to Whole Number if possible. Integers compress much better than strings.
    • Use Fixed Decimal (Currency) rather than floating-point Decimal for financial values. It uses less memory and guarantees 4-decimal precision.

 

Optimize data types pt.2 – Split large decimals

Large decimal numbers or high-precision currency values don’t compress well.

  • Recommendation: If you have a column with massive distinct values (like a raw scientific measurement or high-precision crypto value), split it into three separate Whole Number columns: MillionsThousands, and Ones.
  • Why: Three integer columns often compress significantly better than one high-cardinality decimal column. The slight CPU overhead of summing them back together at query time is outweighed by the scan speed gained from compression. Not universal, but worth testing.

 

Disable Auto Date/Time

This feature creates a hidden internal calendar table for every date column in every table of your model.

  • Recommendation: Turn this off in Power BI ⮕ Options and settings ⮕ Options ⮕ Data Load and use one central DimDate table.
  • Why: This needlessly bloats your memory footprint, as a single fact table with five date columns will force the engine to materialize five separate, redundant calendar tables behind the scenes.

 


 

3. Power Query and Refresh Strategy

Push transformations upstream (Roche’s Maxim)

Heavy transformations inside Power Query consume significant Fabric Capacity during refresh and can slow down development.

  • Recommendation: Avoid creating complex transformations in Power Query. Instead, perform data cleansing and shaping as close to the data source as possible (e.g., in the SQL View, Stored Procedure, or Data Warehouse pipeline) before the data ever reaches Power BI.

In this context “upstream” means closer to where the data is originally produced, and “downstream” means closer to where the data is consumed.

 

Preserve Query Folding

Query Folding is when Power BI pushes the work back to the source database (like SQL or Fabric Warehouse) and translates Applied Step into native query of the source like T-SQL. If folding breaks, Power BI has to pull all raw data into its own memory to transform it, which spikes CU usage and slows down refreshes.

  • Recommendation:
    • Always verify Query Folding by checking that View Native Query is enabled (not greyed out) in the Power Query Editor.
    • If a step breaks folding, move it later in the Applied Steps sequence to keep more operations pushed to the source database.
  • More in the documentation you can find here
  • In DataFlow Gen2 you can find more indicators of Query Folding, check them here

  

Switch to Direct Lake on OneLake or Import

…instead of Direct Query

  • Recommendation: DirectQuery sends every query back to the source database, creating high latency and spiking CU usage with each user interaction. Direct Lake reads data directly from OneLake parquet files (no import or refresh needed), delivering almost the speed of the Import with minimal overhead. If Direct Lake is not available, use Import mode.
  • More information about the differences between Direct Lake on OneLake and SQL Endpoit here

When using Direct Lake method, connect to OneLake (not the SQL Endpoint). Direct Lake on SQL Endpoint is prone to fallback scenarios (especially with RLS or complex queries) that silently degrade performance to DirectQuery speeds. Direct Lake on OneLake is the most stable and performant option.

  

Implement Incremental Refresh

If you are using Import mode, you don’t need to reload all the data from last 4 years each morning.

  • Recommendation: Configure Incremental Refresh feature to keep history (e.g. last 2 years) but only refresh the data that has actually changed (e.g. last 1 day). This approach reduces the usage of Fabric Capacity, and time needed to load the Semantic Model.

  

Utilize partition refresh

In addition to incremental refresh, it is possible to selectively refresh the Semantic Model to avoid reloading all data.

  • Recommendation Use the XMLA endpoint or the Power BI REST API to refresh specific tables or specific partitions. This is particularly useful for large models where full refreshes consume excessive Capacity Units or time. Remember to enable ReadWrite option for XMLA Endpoint in Admin Portal ⮕ Capacity Settings ⮕ Power BI Workloads

 


 

4. DAX & Query Optimisation

A bit of theory to start with.
Semantic model performance is not just about data storage and star schema. DAX measure design is often the real bottleneck. Every query is executed by two cooperating engines:

Formula Engine (FE) is flexible but expensive. Supports iterators (SUMX, AVERAGEX, COUNTX, RANKX), row-by-row calculations, and operations on decompressed data.

  • Characteristics:
    • Single-threaded (can only use one CPU core).
    • CPU-heavy.
    • Slower at scale.

Storage Engine (SE) is fast and parallel. Handles column scans, simple aggregations (SUMCOUNT), and filtering on compressed data.

  • Characteristics:
    • Multi-threaded (uses all available cores).
    • Highly optimized for compression.
    • Extremely fast for large datasets.

Push Work Into the Storage Engine – the fastest measures are those where the Storage Engine does the heavy lifting (scans + aggregation) and the Formula Engine only combines the results. Throughout the following chapter, you will find references to these two engines, so it’s worth bearing this in mind.

Maximize: Storage Engine usage.

Minimize: Formula Engine time and the number of “callbacks” (FE ↔ SE back-and-forth).

 

Performance Analyzer

Your starting point before changing any DAX measure, use the Performance Analyzer in Power BI Desktop. It shows you exactly which visual is slow and whether the bottleneck is the DAX query itself.

  

Use Variables (VAR) Extensively

Variables are evaluated once. Using them prevents the engine from calculating the same expression multiple times.

Example:

DAX
Profit Margin Prc = 
VAR _TotalRevenue = SUM(Sales[TotalSales]) 
VAR _TotalCost = SUM(Sales[Cost]) 
VAR _Profit = _TotalRevenue - _TotalCost 

RETURN 
    IF( 
        _TotalRevenue = 0,
        BLANK(),
        DIVIDE(_Profit, _TotalRevenue)
    )

  

Context Transition and Iterators

Iterators (SUMXAVERAGEXCOUNTXFILTERADDCOLUMNS, etc.) calculate a formula by stepping through a table row-by-row. Their performance depends entirely on how complex that row-by-row formula is.

  • Recommendation:
    • Replace SUMX(T, T[Col]) with SUM(T[Col])they have identical result but lower overhead.
    • If your iterator contains CALCULATE or a measure reference, try to restructure. Apply the filter once before the iteration starts, then let the iterator do only simple math. Filtering once is far cheaper than re-evaluating the filter on every row.
    • Reduce the cardinality of the iterated table. Use SUMMARIZEGROUPBY, or VALUES on a dimension column to iterate over groups rather than individual fact rows.
    • In DirectQuery mode, treat every iterator with context transition as a red flag. Each Storage Engine request becomes a SQL query over the wire.

When iterators are cheap: Simple arithmetic likeSUMX(Sales, Sales[Quantity] * Sales[Price]) is fast even on large tables. The engine pulls the needed columns into memory in a single pass and runs through the math efficiently. That said, native aggregators like SUM, AVERAGE, and COUNT are still faster because the engine can resolve them without any row-by-row evaluation at all. So if you’re just aggregating a single column, always prefer SUM(Sales[Amount]) over SUMX(Sales, Sales[Amount]). But don’t fear iterators when you genuinely need to combine multiple columns in an expression – that’s what they’re built for and they handle it quite well.

DAX
Total Revenue =
SUMX(
    Sales,
    Sales[Quantity] * Sales[UnitPrice] * (1 - Sales[DiscountPct])
)

 
When iterators are expensive: Iterators become dangerously slow if the formula inside them uses CALCULATE or references an existing measure. Doing this causes a Context Transition. This means the system has to stop, convert the current row into a filter, and run a fresh, standalone calculation for every single row. If you do this on a table with millions of rows, the system gets bottlenecked trying to process millions of separate mini-calculations one at a time.

DAX
Sales With Discount =
SUMX(
    Sales,
    CALCULATE(
        SUM(Sales[Amount]),
        Discounts[Category] = EARLIER(Sales[Category])
    )
)

 

Prefer Direct Predicates in CALCULATE

When filtering by a simple column-value comparison in CALCULATE, use a direct predicate rather than FILTER over the entire fact table.

  • Suboptimal (Bad): FILTER('Sales', ...) scans all rows of the fact table to produce a filtered result. On large tables (millions of rows), this forces the Storage Engine to do an extra scan to materialize the full filtered table before the aggregation can even start. The Formula Engine then has to perform an expensive row-matching join to connect the filtered rows back to the aggregation adding unnecessary overhead which is easily avoidable.
DAX
Sales USD Bad =
CALCULATE(
    Sales[Sales Amount],
    FILTER(
        'Sales',
        Sales[Currency Code] = "USD"
    )
)

  

  • Preferred (Good): Internally, this is equivalent to FILTER(ALL(Sales[Currency Code]), Sales[Currency Code] = "USD"), which iterates only the distinct values of the Currency Code column (the VertiPaq dictionary). If Currency Code has 5 distinct values and the Sales table has 100 million rows, the engine iterates 5 values instead of 100 million and does so in a single Storage Engine scan without the extra materialization step.
DAX
Sales USD Good =
CALCULATE(
    Sales[Sales Amount],
    Sales[Currency Code] = "USD"
)

  

Measured Performance Impact

MetricPreferred (Direct Predicate)Suboptimal (FILTER on Fact)Delta
Total Duration56,031 ms68,753 ms+23% slower
FE Duration54,080 ms64,929 ms+10,849 ms
SE Duration1,951 ms3,824 ms~2× slower
SE Queries12Extra scan
SE Data (KB)77,92477,924 + 38,962+50% more data

The following results were captured in DAX Studio using Server Timings against a Sales fact table, with both patterns evaluated across a high-cardinality grouping (SUMMARIZECOLUMNS by CustomerKey × Date).

The suboptimal version issued 2 Storage Engine scans instead of 1. The second scan materialized the FILTER result over the full fact table to determine which rows match the condition, then required additional Formula Engine coordination to join that result back into the aggregation.

Important: On a single scalar evaluation (e.g., ROW("Result", [Measure])), both patterns may produce identical query plans due to VertiPaq predicate folding. The performance gap only becomes measurable when the measure is evaluated thousands of times which is exactly what happens in a real report visual with multiple dimension groupings.

  • When to use FILTER: Reserve FILTER for conditions that cannot be expressed as direct predicates:
    • Comparing two columns from the same row e.g., FILTER('Sales', Sales[Delivery Date] > Sales[Order Date])
    • Evaluating a measure within the filter condition e.g., FILTER(VALUES('Product'[ProductKey]), [Total Sales] > 1000)
    • Complex Boolean logic that cannot be decomposed into independent column predicates

For these complex scenarios, see Optimize FILTER for Complex Conditions below.

  

Optimize FILTER for Complex Conditions

This section of the recommendations is considerably more detailed, so I would advise you to take a little extra time to go through it.

  • Recommendation: When FILTER is required for complex logic that cannot be expressed as a direct predicate (e.g., comparing two columns, evaluating measures), the table you pass as FILTER’s first argument has a significant impact on both performance and correctness. Apply two optimization levers:
    • Reduce the rows FILTER must iterate. Use ALL or ALLSELECTED to iterate distinct column combinations instead of the full fact table.
    • Keep simple conditions outside FILTER. If part of your logic can be written as a direct predicate, don’t bury it inside FILTER. Pull it out as a separate CALCULATE argument. Only use FILTER for the part that truly requires it.
  • Why complex FILTER is expensive: Simple conditions like Sales[Category] = "Electronics" are fast because the Storage Engine can resolve them internally using its compressed column indexes and no row-by-row work is needed.
    Cross-column comparisons like Sales[DeliveryDate] > Sales[OrderDate] are different. The Storage Engine can’t resolve these from its indexes, so it falls back to the slower Formula Engine checking each row individually. On a multi-million row table, that row-by-row evaluation is the main cost.
    But row count isn’t the only problem. When you write FILTER(Sales, ...), the result carries a reference to every column in the Sales table. This forces the Formula Engine to do extra work matching the filtered rows back to the full table during aggregation. Essentially an expensive join behind the scenes.
    Using ALL(Sales[Col1], Sales[Col2]) instead solves both problems: it only references the columns you actually need (eliminating the extra join), and it returns only distinct combinations (drastically reducing row count). If your Sales table has 15 million rows but only 500 unique price/cost combinations, the Formula Engine evaluates 500 rows instead of 15 million.

Below you will find examples, along with the times they were generated, which support the above recommendation.

Use ALL or ALLSELECTED to Iterate Distinct Combinations

Example 1: Cross-column comparison only
  • Suboptimal (Bad) – FILTER over the full fact table: Iterates every row in Sales to evaluate the price condition, even though Unit Price and Unit Cost may repeat across thousands of rows with the same product.
DAX
High Value Products Bad =
CALCULATE (
    SUM ( Sales[Sales Amount] ),
    FILTER ( 'Sales', Sales[Unit Price] > Sales[Unit Cost] )
)

 

  • Preferred (Good) – FILTER over distinct combinations with ALL: ALL( Sales[Unit Price], Sales[Unit Cost] ) produces the distinct combinations of those two columns – far fewer rows than the full fact table. Because the columns come directly from the Sales table, lineage is preserved and CALCULATE can apply the filter correctly..
DAX
High Value Products Good =
CALCULATE (
    SUM ( Sales[Sales Amount] ),
    FILTER (
        ALL ( Sales[Unit Price], Sales[Unit Cost] ),
        Sales[Unit Price] > Sales[Unit Cost]
    )
)

 

Measured Performance Impact

MetricPreferred (ALL)Suboptimal (Full Table)Delta
Total Duration95,297 ms145,309 ms~34% faster
FE Duration92,207 ms133,364 ms−41,157 ms
SE Duration3,090 ms11,945 ms~74% faster
SE Queries12Extra scan eliminated
SE Data (KB)131,066371,696 (across 2 scans)~65% less data
Physical Plan4 lines (clean)8 lines (CrossApply + Spool)

These results were captured in DAX Studio using Server Timings and Query Plan against the Sales fact table, with both patterns evaluated across SUMMARIZECOLUMNS by CustomerKey × Date.

In testing, the suboptimal version scanned all rows of the fact table to evaluate the cross-column comparison. The Formula Engine had to match every filtered row back to the full table. The optimized version skips that entirely, iterating only over the distinct column combinations instead of every row.

Example 2: Cross-column comparison + simple equality

When you also need a simple equality condition alongside a cross-column comparison, layer both optimization levers: ALL for the complex condition and a direct predicate for the simple one.

  • Suboptimal (Bad) – All conditions inside FILTER on the full fact table: Even though the Storage Engine can handle the simple equality efficiently, the cross-column comparison still forces the Formula Engine to check every row in the full fact table one by one.
DAX
Late Deliveries USD Bad =
CALCULATE (
    SUM ( Sales[Sales Amount] ),
    FILTER (
        'Sales',
        Sales[Delivery Date] > Sales[Order Date] && Sales[Currency Code] = "USD"
    )
)

 

  • Preferred (Good) – Direct predicate + ALL for the complex condition: The simple equality Currency Code = "USD" stays as a direct predicate. The Storage Engine resolves it instantly using its compressed indexes. The cross-column comparison uses ALL to iterate only the distinct Delivery Date × Order Date combinations instead of every row. CALCULATE then combines both filters independently.
DAX
Late Deliveries USD Good =
CALCULATE (
    SUM ( Sales[Sales Amount] ),
    Sales[Currency Code] = "USD",
    FILTER (
        ALL ( Sales[Delivery Date], Sales[Order Date] ),
        Sales[Delivery Date] > Sales[Order Date]
    )
)

  

Measured Performance Impact

MetricPreferred (Direct Predicate + ALL)Suboptimal (All in FILTER)Delta
Total Duration20,810 ms25,506 ms~18% faster
FE Duration20,090 ms24,105 ms−4,015 ms
SE Duration720 ms1,401 ms~49% faster
SE Queries12Extra scan eliminated
SE Data (KB)31,75131,788 + 15,894~33% less data
Physical Plan4 lines (clean)8 lines (CrossApply + Spool)

These results were captured in DAX Studio using Server Timings and Query Plan against the Sales fact table, with both patterns evaluated across SUMMARIZECOLUMNS by CustomerKey × Date

  
The optimized version runs a single scan through the Storage Engine. The suboptimal version, despite handling the currency filter efficiently, still forces the Formula Engine to process additional 2 million rows for the cross-column comparison and perform an expensive row-matching join behind the scenes.

ALL vs ALLSELECTED – When to use which?

ALL ignores all external filters like slicers, visual-level filters, everything. ALLSELECTED returns distinct combinations while respecting slicer and visual-level filters. In most report measures, you need to respect user selections, which makes ALLSELECTED the safer default.

A note on ALLSELECTED performance
You may have seen warnings that ALLSELECTED causes performance problems. Those concerns apply to patterns like FILTER(ALLSELECTED('Sales'), ...), which iterates the full table within a shadow filter context – commonly seen in running-total calculations where every row triggers a context transition and re-evaluation. Our pattern is fundamentally different: ALLSELECTED(Column1, Column2) returns only the distinct combinations of those specific columns, not the full table. There is no per-row context transition, no measure evaluation inside the FILTER – just a simple comparison across a small set of distinct values. The performance characteristics are comparable to ALL(Column1, Column2), with the added benefit of respecting external slicer selections.

When ALL Does NOT Work

If your filter condition only compares columns listed in your ALL() call using simple operators, use ALL. If it involves measures, EARLIER, references to other columns, or logic that depends on other rows, iterate the full table instead.

Safe – result depends only on Unit Price and Unit Cost:

DAX
FILTER(
    ALL ( Sales[Unit Price], Sales[Unit Cost] ),
    Sales[Unit Price] > Sales[Unit Cost]
)

  

Not safe – result depends on other rows (customer’s average):

DAX
FILTER(
    'Sales',
    Sales[Sales Amount] > AVERAGEX( RELATEDTABLE('Sales'), Sales[Sales Amount] )
)

  

Why Not SELECTCOLUMNS?

You may encounter advice to wrap the iterated table in SELECTCOLUMNS to reduce the number of columns processed. With direct column references, SELECTCOLUMNS does narrow the lineage and eliminates the CrossApply (in query plan) producing the same clean query plan and comparable performance as ALL/ALLSELECTED. However, it carries a unique correctness risk that makes it the wrong choice.

DAX
CALCULATE (
    SUM ( Sales[Sales Amount] ),
    FILTER (
        SELECTCOLUMNS (
            'Sales',
            "Price", Sales[Unit Price],
            "Cost", Sales[Unit Cost]
        ),
        [Price] > [Cost]
    )
)

When SELECTCOLUMNS contains calculated expressions (example below) instead of direct column references, lineage breaks silently. The engine cannot trace a computed column like Sales[Unit Price] - Sales[Unit Cost] back to any model column, so CALCULATE finds no lineage to propagate and discards the filter producing wrong results with no error or warning.

DAX
CALCULATE (
    SUM ( Sales[Sales Amount] ),
    FILTER (
        SELECTCOLUMNS (
            'Sales',
            "Margin", Sales[Unit Price] - Sales[Unit Cost]
        ),
        [Margin] > 100
    )
)

Testing confirmed this: a FILTER using SELECTCOLUMNS(..., "Margin", Sales[Unit Price] - Sales[Unit Cost]) returned the unfiltered total (23.2B) instead of the correct filtered result (21.1B). The filter was completely ignored. Direct column references like Sales[Quantity] preserved lineage correctly and returned matching results.

Since ALLSELECTED achieves the same performance with no lineage risk, simpler syntax, and clearer intent, there is no scenario where SELECTCOLUMNS is the better choice inside FILTER as a CALCULATE argument.

  

Avoid Forcing Non-Blank Outputs

A common mistake is forcing a measure to return 0 instead of Blank, or unconditionally appending text (e.g., [Sales] & " USD"). This destroys the engine’s ability to skip empty rows. If you have 1 million customers and only 5 bought something, a normal query processes 5 rows. If you add + 0 or USD, the engine forces a result for all 1 million customers, increasing memory and CPU usage.

  • Recommendation: Wrap your logic to check for existence first: IF( NOT ISBLANK([Sales Amount]), [Sales Amount] & " USD", BLANK() )
  • Why: Developers do this to force visuals (like a Matrix) to display 0 or “USD” instead of empty cells, often for aesthetics to indicate that a region had no sales.

  

Use SUMMARIZECOLUMNS for Grouping and Aggregation

SUMMARIZECOLUMNS is the primary function Power BI uses internally to populate visuals, and it should be your default for any grouping and aggregation in DAX.

  • Recommendation: Use SUMMARIZECOLUMNS as your default function for grouping and aggregation in DAX queries, variables, table expressions, and measures. SUMMARIZECOLUMNS fully supports context transitions, meaning it works inside CALCULATE, CALCULATETABLE, iterators, and measures. Try to avoid SUMMARIZE’s own expression parameters to compute aggregations, this approach is deprecated.
  • Why:
    • SUMMARIZECOLUMNS is just faster. It is the same function Power BI generates internally for visuals, so Microsoft has heavily optimized its query plans. It pushes filters into Storage Engine queries, consolidates scans, and automatically removes blank rows – all of which reduce materialization overhead compared to SUMMARIZE + ADDCOLUMNS.
    • SUMMARIZE’s expression columns are deprecated and generate inefficient query plans. While modern engines may still return correct results, SUMMARIZE builds worse plans when expressions are used. It pulls in unnecessary columns, creates intermediate groupings, and materializes far more rows than needed.

Preferred – SUMMARIZECOLUMNS

DAX
EVALUATE
SUMMARIZECOLUMNS (
    'Product'[Category],
    'Date'[Calendar Year],
    "Total Sales", [Total Sales],
    "Total Cost", [Total Cost]
)

  

Preferred – SUMMARIZE + ADDCOLUMNS (still works, but no longer required as a fallback)

DAX
EVALUATE
ADDCOLUMNS (
    SUMMARIZE ( Sales, 'Product'[Category], 'Date'[Calendar Year] ),
    "Total Sales", [Total Sales],
    "Total Cost", [Total Cost]
)

  

Avoid – SUMMARIZE with direct expressions

DAX


EVALUATE
SUMMARIZE (
    Sales,
    'Product'[Category],
    "Total Sales", [Total Sales]
)

Performance Comparison

Tested on sample model – grouping by Product Category with Sales Amount and Average Subcategory Sales. All three approaches returned identical results, but the cost of computing them was very different.

MetricSUMMARIZECOLUMNSADDCOLUMNS + SUMMARIZESUMMARIZE (expressions)
Total time157 ms230 ms391 ms
SE queries234
SE CPU1,094 ms1,845 ms3,641 ms
Max rows materialized32322,517
Materialization passes112

  

Push Conditional Logic Outside of Iterators

Using IF or SWITCH inside row-by-row iterator functions forces the engine to evaluate the condition for every single row, adding significant overhead.

  • Bad: Premium Orders = SUMX(Orders, IF(Orders[Total] > 500, Orders[Total] * Orders[Discount])) — The IF condition is evaluated millions of times if your Orders table is large.
  • Good: Premium Orders = CALCULATE([Total Discounted], Orders[Total] > 500) — The filter is applied once at the storage engine level before aggregation, dramatically reducing the number of operations.

Whenever possible, convert row-level conditional logic into CALCULATE filters. This shifts the work from the formula engine (slow, row-by-row) to the storage engine (fast, optimized for bulk operations).

 

AI-Assisted Tuning: MCP DAX Performance Tuner

For a systematic, research-driven approach to optimization, use the MCP DAX Performance Tuner. This tool utilizes the Model Context Protocol (MCP) to give Large Language Models (LLMs) direct access to your model to perform deep optimization.

  • Deep Performance Analysis: It runs traces to capture detailed server timings, breaking down bottlenecks between the Formula Engine and Storage Engine.
  • Semantic Validation: Unlike generic AI suggestions, this tool validates its own work by running the new query and comparing row counts and sample records to ensure the results match your baseline exactly.
  • Smart Discovery: It automatically detects running Power BI Desktop instances or Service workspaces to begin tuning immediately.

 


 

5. Optimizing the Data Layout

In Microsoft Fabric, Direct Lake reads Parquet (delta) files as they are from OneLake, unlike Import mode, where the engine reshapes data during refresh. This makes the physical file layout a performance concern we need to tackle.

Enable V-Order on gold layer tables

V-Order is a write-time optimization applied to parquet files that sorts data and enables dictionary encoding. Microsoft reports that V-Order is critical for Direct Lake, providing a 40-60% performance improvement for cold-cache queries.

  • Fabric Data Warehouse (T-SQL): Zero maintenance required. If you are writing data using a Fabric Warehouse, system-managed optimization automatically applies V-Order, handles auto-compaction in the background, and manages checkpoints. The output is inherently optimized for Power BI consumption.
  • Fabric Lakehouse (Spark/Notebooks): While you want V-Order for your semantic models, applying it everywhere is a mistake. V-Order adds a 15-33% overhead to write times. The recommended strategy is to disable V-Order (if its turned on) for your Bronze and Silver medallion layers (where fast ingestion and transformation matter most) and explicitly enable it only when writing your Gold layer tables that Direct Lake will consume.
SQL
ALTER TABLE schema_name.gold_table
SET TBLPROPERTIES ('delta.parquet.vorder.enabled' = 'true')

 

Right-Size your Parquet files

Direct Lake performance is sensitive to file layout. Too many small files means excessive overhead opening and transcoding each one. Too few oversized files limits the engine’s ability to parallelize reads. The target for Direct Lake consumption is files between 400 MB and 1 GB, with 8 million or more rows per row group.

There are two complementary strategies to achieve this — prevent fragmentation at write time, and fix it after the fact:

At write time, prevent fragmentation:

Python
spark.conf.set('spark.databricks.delta.optimizeWrite.enabled', 'true')
spark.conf.set('spark.databricks.delta.autoCompact.enabled', 'true')

 

After writes, compact and apply V-Order in one pass:

SQL
OPTIMIZE schema_name.gold_table VORDER

 

Remember about the cleanup, run VACCUM periodically to remove old unreferenced files:

SQL
VACUUM schema_name.gold_table RETAIN 168 HOURS

 

Organizing Data for Filter Performance

Beyond file size, how data is organized within and across files determines whether the engine can skip irrelevant data during queries. If a user filters by Region = 'Europe' and all regions are scattered randomly across every file, Direct Lake must scan everything. If the data is organized so that all European rows live in the same files, the engine skips the rest entirely saving significant compute.

There are two approaches:

Traditional partitioning

(partitionBy) physically separates data into subdirectories by a chosen column. This works well when you have a clear, stable, low-cardinality filter column (e.g., Year or Region). The benefit is twofold: the engine skips non-matching partitions entirely, and the partition column within each file compresses perfectly since it contains only one distinct value.

Python
df.write.format("delta") \
    .partitionBy("Region") \
    .mode("overwrite") \
    .save("Tables/gold_sales")

The risk with traditional partitioning is getting it wrong. Too many partitions (high-cardinality columns like CustomerID) produce thousands of tiny files — exactly the problem you’re trying to avoid. And once you partition by a column, changing it later requires rewriting the entire table.

Liquid Clustering

is the newer, more flexible alternative that Microsoft now recommends. Instead of rigid directory-based partitions, it organizes data within files using clustering keys that can be changed without rewriting history. Liquid Clustering requires Fabric Spark Runtime 1.3 or later. It adapts to evolving query patterns and works together with OPTIMIZE — each time you run OPTIMIZE, it re-clusters the data incrementally.

SQL
CREATE TABLE schema_name.gold_sales (
    id INT, region STRING, sale_date DATE, amount DECIMAL
) CLUSTER BY (region)

 

If your table already exists and is unpartitioned, you can enable clustering after the fact (in pySpark this time):

Python
spark.sql("ALTER TABLE schema_name.gold_sales CLUSTER BY (region)")

 

Note that Liquid Clustering is not compatible with partitioned tables — if you’re migrating from partitionBy, you’ll need to recreate the table without partitions first.

The trade-off compared to traditional partitioning is that Liquid Clustering doesn’t provide the absolute partition-elimination guarantees of physical subdirectories. For most new implementations, however, the flexibility and lower maintenance overhead make it the better default.

 


 

6. Automated Validation and Tools

Tabular Editor & BPA

The fastest way to find bottlenecks is the Best Practice Analyzer (BPA) within Tabular Editor. Think of BPA as a lint checker for your data model. It scans your model against a set of proven rules, catching unused objects consuming memory, data types mismatches hurting compression, and inefficient DAX patterns. Run BPA before any manual tuning to get a prioritized list of fixes instead of guessing. What might be useful, you can run BPA during CI/CD process within Azure DevOps pipeline.

  • You can find Tabular Editor and BPA here

 

DAX Studio & VertiPaq Analyzer

DAX Studio is the tool every DAX developer should have installed. The Server Timings feature, used throughout this article to capture all benchmarks, shows exactly how much time is spent in the Formula Engine versus the Storage Engine. The Physical Query Plan reveals whether a measure triggers a clean plan with a single scan or the expensive CrossApply pattern discussed in the FILTER optimization sections.

  • You can download DAX Studio from here

VertiPaq Analyzer, integrated into both DAX Studio and Tabular Editor, visualizes memory consumption at the column level. It is the fastest way to identify which columns consume the most RAM and are candidates for removal or cardinality reduction.

  • Grab VeriPaq Analyzer here

 

Measure Killer

Measure Killer scans your semantic model and identifies measures not referenced by any report visual, other measure, or calculated column. Unused measures add complexity and make maintenance harder. Measure Killer provides a clear list of candidates for removal.

  • MeasureKiller page click here

 

Summary

Unoptimized semantic models in Microsoft Fabric are not just slow, they actively burn through Capacity Units and drive up costs. To prevent this, I recommend shifting heavy data transformations upstream to the Lakehouse or Warehouse, keeping your model lean with a strict star schema, and adjusting Parquet files for Direct Lake. I also emphasize writing efficient DAX and using diagnostic tools like DAX Studio and Tabular Editor to locate real bottlenecks.

I know firsthand that this is all much easier said than done. In the real world, as projects scale and data gets messy, maintaining a perfect model is a massive ongoing battle. My hope is that this guide serves as your practical diagnostic toolkit, giving you exactly the right places to start digging when your reports inevitably begin to lag.

I am Andrzej Strzała (aka Andrew Arrow), data technology enthusiast passionate about crafting innovative data solutions that drive business transformation. With over a decade of experience in designing and implementing robust data architectures, I specialize in leveraging modern tools like Microsoft Fabric, Azure, and Power BI to deliver insights that empower decision-making.

Leave a Reply

Your email address will not be published. Required fields are marked *