Jekyll2024-03-28T10:01:47+00:00https://www.kerski.tech//feed.xmlJohn Kerski’s BlogA blog about technology like Microsoft Power Automate, Power BI, Azure, and more.
John KerskiWeaving DataOps into Microsoft Fabric - Automating DAX Query View Testing Pattern with Azure DevOps2024-03-26T00:00:00+00:002024-03-26T00:00:00+00:00https://www.kerski.tech//bringing-dataops-to-power-bi-part37<h2 id="weaving-dataops-into-microsoft-fabric---automating-dax-query-view-testing-pattern-with-azure-devops">Weaving DataOps into Microsoft Fabric - Automating DAX Query View Testing Pattern with Azure DevOps</h2>
<p>In my last article, I covered implementing the <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part36/" target="_blank">DAX Query View Testing Pattern</a> to establish a standardized schema and testing approaches for semantic models in Power BI Desktop. This pattern facilitates sharing tests directly within the application. If you’re interested in a demonstration, check out my recent <a href="https://youtu.be/WyMQSyf3NvM?si=-W3TxyyJQXE0m-et" target="_blank">YouTube video</a> on the subject.</p>
<p>Now, as promised, let me discuss automating testing (i.e., Continuous Integration) using <a href="https://learn.microsoft.com/en-us/power-bi/developer/projects/projects-overview" target="_blank">PBIP</a> and <a href="https://learn.microsoft.com/en-us/power-bi/developer/projects/projects-git" target="_blank">Git Integration</a>.</p>
<p><em>DataOps Principle Orchestrate: The beginning-to-end orchestration of data, tools, code, environments, and the analytic team’s work is a key driver of analytic success.</em></p>
<p>Orchestration through Continuous Integration and Continuous Deployment (CI/CD) is essential for delivering analytics to customers swiftly while mitigating the risks of errors. <strong>Figure 1</strong> illustrates the orchestration of automated testing.</p>
<p><img src="/assets/img/posts/part37/Figure1.png" alt="Figure 1" class="center-image" />
<em class="center-text-figure">Figure 1 – High-level diagram of automated testing with PBIP, Git
Integration, and DAX Query View Testing Pattern</em></p>
<h3 id="high-level-process">High-Level Process</h3>
<p>In the process depicted in <strong>Figure 1</strong>, your team <strong><u>saves</u></strong> their Power BI work in the PBIP extension format and <strong><u>commits</u></strong> those changes to Azure DevOps.</p>
<p>Then, you or your team <strong><u>sync</u></strong> with the workspace and <strong><u>refresh</u></strong> the semantic models. For this article, I am assuming either manual integration or the use of <a href="https://github.com/microsoft/Analysis-Services/tree/master/pbidevmode/fabricps-pbip" target="_blank">Rui Romano’s code</a> to deploy a PBIP file to a workspace, with semantic models refreshed appropriately. With these criteria met, you can execute the tests.</p>
<h4 id="automated-testing">Automated Testing</h4>
<p>With the PBIP format, each tab in your DAX Query View exists as a separate DAX file in the “.Datasets/DAXQueries” folder (as demonstrated in <strong>Figure 2</strong>).</p>
<p><img src="/assets/img/posts/part37/Figure2.png" alt="Figure 2" class="center-image" /></p>
<p><em class="center-text-figure">Figure 2 - Example of DAX Query Tests in DAXQueries folder for the PBIP format</em></p>
<p>You can then leverage the Fabric application programming interfaces (APIs) and XMLA to execute each test query against the semantic model in the service. These tests can be executed through a pipeline, or they can run a schedule to verify all tests pass several times a day. But how?</p>
<h3 id="template">Template</h3>
<p>Well, I have a template for that on <a href="https://github.com/kerski/fabric-dataops-patterns/blob/main/DAX%20Query%20View%20Testing%20Pattern/automated-testing-example.md" target="_blank">fabric-dataops-patterns</a>. To get started, you need:</p>
<ol>
<li>
<p>An Azure DevOps project with at least Project or Build Administrator rights</p>
</li>
<li>
<p>A premium-back capacity workspace connected to the repository in your Azure DevOps project. Instructions are provided <a href="https://learn.microsoft.com/en-us/power-bi/developer/projects/projects-git" target="_blank">here</a>. I have tested this template in a Premium Per User and Fabric capacity.</p>
</li>
<li>
<p>A Power BI tenant with <a href="https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-connect-tools#enable-xmla-read-write" target="_blank">XMLA Read/Write Enabled</a>.</p>
</li>
<li>
<p>A service principal or account (i.e., a username and password) with a Premium Per User license. If you are using a service principal, you will need to make sure the Power BI tenant allows <a href="https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-service-principal#enable-service-principals" target="_blank">service principals to use the Fabric APIs</a>. The service principal or account will need at least the Member role in the workspace.</p>
</li>
</ol>
<p>With these requirements met, you can follow <a href="https://github.com/kerski/fabric-dataops-patterns/blob/main/DAX%20Query%20View%20Testing%20Pattern/automated-testing-example.md#instructions" target="_blank">these instructions</a> to create the variable group, set up the pipeline, and copy the sample YAML file to get started.</p>
<p>If you follow the steps correctly, any semantic models in the workspace that also exist in the repository and have test files will be queried to determine pass or fail statuses (<strong>Figure 3</strong>).</p>
<p><img src="/assets/img/posts/part37/Figure3.png" alt="Figure 3" class="center-image" /></p>
<p><em class="center-text-figure">Figure 3 - Example of DAX tests conducted through the build agent in Azure DevOps</em></p>
<p>Any failed tests will be logged by the code as an error and the pipeline will fail (see <strong>Figure 4</strong>).</p>
<p><img src="/assets/img/posts/part37/Figure4.png" alt="Figure 4" class="center-image" /></p>
<p><em class="center-text-figure">Figure 4 - Example of failed test identified with automated testing</em></p>
<p>To run tests for select semantic models, pass the semantic model IDs as a comma-delimited string into the pipeline. The pipeline will only conduct tests for those semantic models (see <strong>Figure 5</strong>).</p>
<p><img src="/assets/img/posts/part37/Figure5.png" alt="Figure 5" class="center-image" /></p>
<p><em class="center-text-figure">Figure 5 - Example of automating tests for a select number of semantic models</em></p>
<p>This is especially helpful if you are looking to take this pattern and apply this testing pipeline <a href="https://learn.microsoft.com/en-us/azure/devops/pipelines/process/templates?view=azure-devops&pivots=templates-includes" target="_blank">as a template</a>.</p>
<h3 id="monitoring">Monitoring</h3>
<p>It’s essential to monitor the Azure DevOps pipeline for any failures. I’ve also written about some best practices for setting that up <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part31/" target="_blank">in this article</a>.</p>
<h3 id="next-steps">Next Steps</h3>
<p>I hope you find this helpful in establishing a <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part36/" target="_blank">consistent pattern for testing</a> your semantic models and instituting <a href="https://github.com/kerski/fabric-dataops-patterns/blob/main/DAX%20Query%20View%20Testing%20Pattern/automated-testing-example.md" target="_blank">a repeatable process for automating testing</a>. I’d like to thank <a href="https://pt.linkedin.com/in/ruiromano?trk=author_mini-profile_title" target="_blank">Rui Romano</a> for <a href="https://github.com/microsoft/Analysis-Services/tree/master/pbidevmode" target="_blank">the code</a> provided on the Analysis Service Git Repository as it helped accelerate my team’s work to automate testing with the Fabric APIs.</p>
<p>In future articles, I will cover various aspects of automated testing
and how to proactively react to failures.</p>
<p>As always, let me know what you think on <a href="https://www.linkedin.com/in/john-kerski-41a697100" target="_blank">LinkedIn</a> or <a href="https://twitter.com/jkerski" target="_blank">Twitter/X</a>.</p>
<p><em>This article was edited by my colleague and senior technical writer, <a href="https://www.linkedin.com/in/kiley-williams-garrett-9a16a618a/" target="_blank">Kiley Williams Garrett</a>.</em></p>
<p><em>Git Logo provided by <a href="https://git-scm.com/downloads/logos">Git - Logo Downloads
(git-scm.com)</a></em></p>John KerskiWeaving DataOps into Microsoft Fabric - Automating DAX Query View Testing Pattern with Azure DevOpsWeaving DataOps into Microsoft Fabric - DAX Query View Testing Pattern2024-02-27T00:00:00+00:002024-02-27T00:00:00+00:00https://www.kerski.tech//bringing-dataops-to-power-bi-part36<h2 id="weaving-dataops-into-microsoft-fabric---dax-query-view-testing-pattern">Weaving DataOps into Microsoft Fabric - DAX Query View Testing Pattern</h2>
<p>The last three months of Power BI Desktop releases have made this DataOps fanatic giddy. The introduction of <a href="https://learn.microsoft.com/en-us/power-bi/transform-model/dax-query-view" target="_blank">DAX Query View</a> has laid the foundation to easily couple your tests with your semantic model.</p>
<p>Since <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part4/" target="_blank">Part 4</a> on my series, I have demonstrated ways to use DAX to test your semantic model, but this required making sure you meticulously organized your DAX files in a folder structure with your Power BI files. In addition, running these tests required 1) Opening your tool of choice (e.g., DAX Studio, SSMS) 2) Connecting to your local model 3) Opening the DAX file and 4) Running the test (example in Figure 1).</p>
<p><img src="/assets/img/posts/part36/Figure1.png" alt="Figure 1" class="center-image" /></p>
<p><em class="center-text-figure">Figure 1 - An example of test cases and their output.</em></p>
<p>DataOps stresses reducing cycle times and manual tasks… DAX Query View cuts those steps in half. You simply open DAX Query view within Power BI Desktop and run the test (Figure 2).</p>
<p><img src="/assets/img/posts/part36/Figure2.png" alt="Figure 2" class="center-image" /></p>
<p><em class="center-text-figure">Figure 2 - Example of running tests in DAX Query View within Power BI Desktop</em></p>
<h3 id="introducing-a-pattern">Introducing a Pattern</h3>
<p>In the world of actual, tangible fabrics, a pattern is the template from which the parts of a garment are traced onto woven or knitted fabrics before being cut out and assembled. I would like to take that concept
and introduce a pattern for Microsoft Fabric, the <strong><em>DAX Query View Testing Pattern</em></strong>.</p>
<p>My hope is that with this pattern you have a template to weave DataOps into Microsoft Fabric and have a quality solution for your customers.</p>
<h3 id="why-test">Why Test?</h3>
<p>The hope-and-pray approach to publishing Power BI artifacts is counter to the DataOps mindset. Testing serves as the safety net that prevents your team from introducing errors in production. Testing also serves to identify issues in production proactively.</p>
<p>A customer is more likely to trust you if you come to them with a statement like: “We found an issue in production and we are working on a fix. It impacts this group of people, and I will give you an update in 30 minutes.” As opposed to getting a phone call from a customer stating: “There is an issue in production, are you aware of it?”. Testing allows for the former scenario to more likely occur than the later.</p>
<p>Now I say this knowing that testing only shows the presence of flaws, not the absence. However, if you can empirically show that what your team builds is founded on good testing practices, you have more legitimacy when defending your work. Testing defends against errors and defends against eventual scrutiny you will receive.</p>
<h3 id="how-to-test">How to Test?</h3>
<p>To follow the DAX Query View Testing Pattern you must follow these steps:</p>
<p>1) Setup Workspace Governance</p>
<p>2) Standardize Schema and Naming Conventions</p>
<p>3) Build Tests</p>
<h4 id="setup-workspace-governance">Setup Workspace Governance</h4>
<p>To get started we need to distinguish tests by their intended Power BI or Fabric workspace. This requires instituting workspace governance. You should at a minimum have two workspaces, one for development (DEV) and one for production (PROD). For larger projects, you should have a workspace for clients/customers to test (TEST) before moving to production. If you are unfamiliar with the concept please <a href="https://en.wikipedia.org/wiki/Deployment_environment" target="_blank">read this wiki article</a>.</p>
<p>Your DEV workspace should have a static set of data (preferrable using parameters) to have a stable state with which you can build your tests. To test effectively you need to have a known underlying set of data to validate your semantic model. For example, if your upstream data is Fiscal Year-based, you could parameterize your tests to look at a prior Fiscal Year where the data should be stable. The goal is to have a static set of data to work with, so the only variables that would change during a test is the code you or your team has changed in Power BI.</p>
<p>Your TEST/PROD workspace is not static and considered live. Tests in this workspace are looking to conduct health checks (is there data in
the table?) and identify <a href="https://kerski.tech/bringing-dataops-to-power-bi-part15/" target="_blank"> data drift.</a></p>
<h4 id="standardize-schema-and-naming-conventions">Standardize Schema and Naming Conventions</h4>
<p>With workspace governance in place, you then need to institute two standards when building tests:</p>
<p>1) Standard Output Schema - In this pattern all tests should be based on a standard tabular schema as shown in Table 1.</p>
<p><em class="center-text-figure">Table 1 – Schema for test outputs</em></p>
<table>
<thead>
<tr>
<th style="text-align: left">Column Name</th>
<th style="text-align: left">Type</th>
<th style="text-align: left">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">TestName</td>
<td style="text-align: left">String</td>
<td style="text-align: left">Description of the test being conducted.</td>
</tr>
<tr>
<td style="text-align: left">ExpectedValue</td>
<td style="text-align: left">Any</td>
<td style="text-align: left">What the test should result in. This should be a hardcoded value or function evaluated to a Boolean.</td>
</tr>
<tr>
<td style="text-align: left">ActualValue</td>
<td style="text-align: left">Any</td>
<td style="text-align: left">The result of the test under the current dataset.</td>
</tr>
<tr>
<td style="text-align: left">Passed</td>
<td style="text-align: left">Boolean</td>
<td style="text-align: left">True if the expected value matches the actual value. Otherwise, the result is false.</td>
</tr>
</tbody>
</table>
<p>2) Tab Naming Conventions - Not only do we have a standard schema for the output of our tests, but we also make sure names of your tabs in the DAX Query View have some organization. Here is the naming format I have started to use:</p>
<p><em class="center-text-figure">[name].[environment].test(s)</em></p>
<ul>
<li>
<p><em>[name]</em> is no more than 15-20 characters long. DAX Query View currently expands the tab name to fit the text, but we want to be able to tab between tests quickly.</p>
</li>
<li>
<p><em>[environment]</em> is either DEV, TEST, PROD and represents the different workspaces to run the test against. ALL is used where the same test should be conducted in all workspaces.</p>
</li>
<li>
<p>Finally, the suffix of “.tests” or “.test” helps us distinguish what is a test file versus working files.</p>
</li>
</ul>
<h4 id="build-tests">Build Tests</h4>
<p>With this standard schema and naming convention in place, you can build tests covering three fundamental areas:</p>
<h5 id="testing-calculations">Testing Calculations</h5>
<p>Calculated Columns and Measures should be tested to make sure theybehave as intended and handle edge cases. For example, let us say you have a DAX measure:</p>
<p><em>“IF(SUM(‘TableX’[ColumnY])<0,”Condition 1”,”Condition 2”)”</em></p>
<p>To test properly, you should create conditions to
test when:</p>
<p>a. The summation is > 0</p>
<p>b. The summation is = 0</p>
<p>c. The summation is < 0</p>
<p>d. The summation is blank</p>
<p><img src="/assets/img/posts/part36/Figure3.png" alt="Figure 3" class="center-image" /></p>
<p><em class="center-text-figure">Figure 3 - Example of tests for calculations like DAX measures and calculated columns.</em></p>
<h5 id="testing-content">Testing Content</h5>
<p>Knowing that your tables and columns have the appropriate content is imperative. If you ever <a href="https://youtube.com/shorts/uTqHvxE6208?feature=share" target="_blank">accidentally kept a filter in Power Query</a> that was only intended for debugging/developing, you know testing content is important. Here are some tests you could run with this pattern:</p>
<ul>
<li>The number of rows in a fact table is greater than or equal to a number.</li>
<li>The number of rows in a dimension is not zero.</li>
<li>The presence of a value in a column that shouldn’t be there.</li>
<li>The existence of blank columns.</li>
<li>The values in a custom column are correct.</li>
</ul>
<p><img src="/assets/img/posts/part36/Figure4.png" alt="Figure 4" class="center-image" /></p>
<p><em class="center-text-figure">Figure 4 - Example of testing content of your tables and columns.</em></p>
<p>Note: Regex expressions still cannot be run against content in columns within DAX syntax. I have an alternative approach to that <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part23/" target="_blank">in this article</a>.</p>
<h5 id="testing-schema">Testing Schema</h5>
<p>With the introduction of <a href="https://powerbi.microsoft.com/en-us/blog/dax-query-view-introduces-new-info-dax-functions/" target="_blank">INFO functions in
DAX</a>, testing the schemas of your semantic model is finally that much easier. Schema testing is important because it helps you avoid two common problems (1) Broken visuals and (2) Misaligned Relationships.</p>
<p>Changing names with columns and DAX measures can break visuals that expect the columns spelt a certain way. This is especially troublesome if you have one dataset and multiple reports or report authors.</p>
<p>In addition, with a click of a button you can change a column from numeric to text. That may seem benign but what if a relationship with that column was with another table’s numeric column? You will have
issues, and it is not an easy one to figure out (trust me, I wasted hours trying to resolve an issue only to realize this was the root problem).</p>
<p>So, to test schemas, you need to establish a baseline schema for each table. Luckily I have a <a href="https://github.com/kerski/fabric-dataops-patterns/blob/main/Semantic%20Model/SampleModel.Dataset/DAXQueries/Schema%20Query%20Example.dax" target="_blank">template for
that</a>. This DAX code will generate the schema for you once you enter in the table name. Then you build the test (see Figure 5).</p>
<p>Subsequently, if this test fails you know you either intended to change the schema and need to update the test OR you did not intend the change to happen and you need to fix your model.</p>
<p><img src="/assets/img/posts/part36/Figure5.png" alt="Figure 5" class="center-image" /></p>
<p><em class="center-text-figure">Figure 5 - Example of running tests against your semantic model’s schema.</em></p>
<p>Now you may be asking which tests are intended for DEV and which tests are intended for TEST/PROD? Well, the easy-out is it depends on your data, but Table 2 is my rule of thumb.</p>
<p><em class="center-text-figure">Table 2 - Rule of Thumb of Types of Tests for each Workspace.</em></p>
<table>
<thead>
<tr>
<th style="text-align: left">Workspace</th>
<th style="text-align: left">DEV</th>
<th style="text-align: left">TEST</th>
<th style="text-align: left">PROD</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">Testing Calculations</td>
<td style="text-align: left">X</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">Testing Content</td>
<td style="text-align: left">X</td>
<td style="text-align: left">X</td>
<td style="text-align: left">X</td>
</tr>
<tr>
<td style="text-align: left">Testing Schema</td>
<td style="text-align: left">X</td>
<td style="text-align: left">X</td>
<td style="text-align: left">X</td>
</tr>
</tbody>
</table>
<h3 id="examples">Examples</h3>
<p>Looking for a template of tests? Check out my <a href="https://github.com/kerski/fabric-dataops-patterns/blob/main/documentation/dax-query-view-testing-pattern.md" target="_blank">Fabric DataOps pattern repository on GitHub</a> for a sample model and sets of tests you can leverage in building your own tests. Also don’t forget to leverage the <a href="https://youtu.be/YCs2_NLYlOc?si=fwvWQkui8veGzs5L&t=116" target="_blank">Power BI Performance Analyzer to copy DAX queries from
visuals</a>. It helps you build test cases quicker, avoid syntax errors, and understand DAX a little better (a win, win all around).</p>
<h3 id="whats-next">What’s Next?</h3>
<p>With this pattern in place, you can build tests right in your semantic model, have your teams run them within the comfort of Power BI Desktop, and avoid introducing errors for your customers. But John, you may ask, “doesn’t that mean I have to make sure my team and I run each test manually?” The answer is Yes and No. Yes, when you want to validate some tests before publishing it is good practice to run the tests. But also, no, because with this DAX Query View Testing Pattern we now can leverage another feature, PBIP, to embrace orchestration.</p>
<p>I’ll cover that in my next article.</p>
<p>As always, let me know what you think
on <a href="https://www.linkedin.com/in/john-kerski-41a697100" target="_blank">LinkedIn</a> or <a href="https://twitter.com/jkerski" target="_blank">Twitter/X</a>.</p>John KerskiWeaving DataOps into Microsoft Fabric - DAX Query View Testing PatternCommenting Power Query with Azure OpenAI2024-01-30T00:00:00+00:002024-01-30T00:00:00+00:00https://www.kerski.tech//bringing-dataops-to-power-bi-part35<h2 id="part-35--commenting-power-query-with-azure-openai">Part 35 – Commenting Power Query with Azure OpenAI</h2>
<p><strong><em>Leveraging Artificial Intelligence (AI) to reduce cycle times.</em></strong></p>
<p>Over the past several years, I've seen a lot of Power Query code. Whether it comes from a dataset a teammate built or I inherited
supporting the dataset, understanding Power Query built by others can be challenging.</p>
<p>When reviewing the new Power Query code, you may encounter a list of applied steps that look similar to <strong>Figure 1</strong>: </p>
<p><img src="/assets/img/posts/part35/Figure1.png" alt="Figure 1" class="center-image" /></p>
<p><em class="center-text-figure">Figure 1 - Example of Power Query code with default names that provide
little context.</em></p>
<p>What do you do next? Spend the time to click on each step to understand the code, right? However, this solution is time-consuming. For example,
if you needed to update a custom column, you would need to slog through a series of manual steps: hunting and clicking through the applied steps
to find the appropriate column, deciphering the code in Advanced Editor, copying the code to a text editor, and—finally—searching.</p>
<p>These process inefficiencies add up, leading to slower delivery to customers and poor maintenance. They also represent the antithesis of DataOps, which states, “<em>We should strive to minimize the time and effort to turn a customer need into an analytic idea, create it in development, release it as a repeatable production process, and finally
refactor and reuse that product.”</em></p>
<p>Therefore, to save time and incorporate DataOps principles, I typically ask my teams to emphasize two integral practices:</p>
<p>1) <strong>Add Comments</strong> - For steps involving merges, custom functions, custom columns, or significant complexity, add an explanatory comment
before each step via Advanced Editor. These comments provide context and details to reduce the need for interpretation. <strong>Figure 2</strong> provides an
example of a comment in the Advanced Editor, and <strong>Figure 3</strong> demonstrates how that comment appears in the Applied Steps windows.</p>
<p><img src="/assets/img/posts/part35/Figure2.png" alt="Figure 2" class="center-image" /></p>
<p><em class="center-text-figure">Figure 2 - Example of a comment added in the Advanced Editor.</em></p>
<p><img src="/assets/img/posts/part35/Figure3.png" alt="Figure 3" class="center-image" /></p>
<p><em class="center-text-figure">Figure 3 - Example of the comment in the Applied Steps panel.</em></p>
<p>2) <strong>Use Descriptive Column Names</strong> - Instead of labeling steps with ambiguous names such as “Added Custom Column” or “Renamed Columns8,”
make the step more descriptive in roughly 25 to 50 characters. For
example, if you created a custom column named “Fiscal Year,” you could
rename the step to “Added Fiscal Year.”</p>
<h3 id="if-you-cant-enforce-it-have-artificial-intelligence-do-it">If You Can’t Enforce It, Have Artificial Intelligence Do It.</h3>
<p>As you may already know, enforcing these practices is difficult.
Low-code practitioners and even pro-code developers have been encouraged
to comment on their work appropriately since the dawn of programming
languages, but few routinely do.
This is where AI comes in. Large Language Models can help apply these
practices. With <a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Cpython&pivots=programming-language-studio" target="_blank">ChatGPT available through Azure Open
API</a>, I built a Power BI template that could point to a dataset in the Power
BI service, parse the Power Query code, and offer a transformed version of each table. <strong>Figure 4</strong> provides a high-level overview of the data
pipeline.</p>
<p><img src="/assets/img/posts/part35/Figure4.png" alt="Figure 4" class="center-image" /></p>
<p><em class="center-text-figure">Figure 4 - High-level overview of the data pipeline using Azure OpenAI.</em></p>
<p><strong>Figure 5</strong> provides an example. I also have a <a href="https://app.powerbi.com/view?r=eyJrIjoiYTEwOGZiODQtNTAwNC00YTRjLTg2YTMtNDRmNWNhOWY3YzNiIiwidCI6ImU3MDRkMjE0LWI1YjUtNDc5OS1hZjk2LTYxZmEyNzMwYzI4OSIsImMiOjF9" target="_blank">public version of this report</a> that analyzed a version of a sample dataset (errr…semantic model) I’ve
used for demonstrations.</p>
<p><img src="/assets/img/posts/part35/Figure5.png" alt="Figure 5" class="center-image" /></p>
<p><em class="center-text-figure">Figure 5 - Screenshot of Power Query Code transformed by ChatGPT 4.0 in Azure OpenAI’s Service.</em></p>
<p>With this transformed code available, you then could copy and replace the existing code AND <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part4/" target="_blank">test</a> accordingly.</p>
<h3 id="design">Design</h3>
<p>If you’re interested in how this works, the template depends on the following two components:</p>
<p>1) <strong>Dynamic Management View Queries</strong> - To pull semantic model information, I needed to be able to run <a href="https://learn.microsoft.com/en-us/analysis-services/instances/use-dynamic-management-views-dmvs-to-monitor-analysis-services?view=asallproducts-allversions" target="_blank">dynamic management
view</a> queries to pull the Power Query code for the dataset via <a href="https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-connect-tools" target="_blank">XMLA</a>. This required a premium per-user workspace for housing the semantic
model.</p>
<p>2) <a href="https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy" target="_blank"><strong>Azure Open API</strong></a> - This template depends on you having a
subscription to Azure Open API available. I chose Azure Open API for
the <a href="https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy" target="_blank">terms of
service</a> and the easier path for authority to operate within my working environments
(see <strong>Figure 6</strong>). To set up an Azure Open API endpoint, please <a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal" target="_blank">see this article.</a></p>
<p><img src="/assets/img/posts/part35/Figure6.png" alt="Figure 5" class="center-image" /></p>
<p><em class="center-text-figure">Figure 6 - Microsoft’s Azure Open AI statement of <a href="https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy" target="_blank">data, privacy, and
security</a> as of January 6^th^, 2024.</em></p>
<h3 id="execution">Execution</h3>
<p>With those two components in place, the template performs the following:</p>
<p>1) Asks you to identify the Azure Open AI endpoint and key.</p>
<p>2) Asks you to identify the XMLA Endpoint and Dataset Name.</p>
<p>3) Uses the information provided to extract Power Query code from the
dataset for analysis.</p>
<p>4) Prompts ChatGPT 4.0 to analyze each step in the Power Query code and applies the two best practices.</p>
<p>5) Transforms Power Query code in each table in the dataset based on Chat GPT responses.</p>
<h3 id="try-it-yourself">Try It Yourself</h3>
<p>If you can use Azure Open AI, <a href="https://github.com/kerski/pbi-pq-commenter-with-azure-openai" target="_blank">give the template a
try</a> and let me know what you think. As always, I look forward to hearing from
you on <a href="https://www.linkedin.com/in/john-kerski-41a697100" target="_blank">LinkedIn</a> or <a href="https://twitter.com/jkerski" target="_blank">Twitter</a>.</p>
<p><em>This article was edited by my colleague and senior technical
writer, <a href="https://www.linkedin.com/in/kiley-williams-garrett-9a16a618a/" target="_blank">Kiley Williams
Garrett</a>.</em></p>John KerskiPart 35 – Commenting Power Query with Azure OpenAIRevisiting Microsoft Fabric and DataOps Principles2023-11-30T00:00:00+00:002023-11-30T00:00:00+00:00https://www.kerski.tech//revisiting-fabric-and-dataops-principles<h2 id="revisiting-fabric-dataops-principles-in-the-newest-microsoft-analytics-solution">Revisiting Fabric: DataOps Principles in the Newest Microsoft Analytics Solution</h2>
<p>When Microsoft unveiled the Public Preview of Fabric in May, I explored <a href="https://www.kerski.tech/how-dataops-is-woven-into-the-microsoft-fabric/" target="_blank">how DataOps was woven into the all-encompassing analytics solution</a>. In that article, I emphasized the <strong><em>enduring applicability of DataOps principles in the evolving landscape of Microsoft tools, including Fabric and Power BI.</em></strong> These principles are essential for anticipating and adapting to inevitable tooling changes.</p>
<p>Since then, Fabric has undergone significant changes and shifted to
<a href="https://blog.fabric.microsoft.com/en-us/blog/announcing-general-availability-explore-the-capabilities-of-real-time-analytics-in-microsoft-fabric?ft=All" target="_blank">General Availability</a> in the commercial sector. To assess the progress Microsoft has made in aligning Fabric with DataOps principles, I thought I would revisit the product through the lens of several DataOps principles: <strong><em>Make it Reproducible, Quality is Paramount, Monitor for Quality and Performance, Orchestrate,</em></strong> and <strong><em>Reduce Heroism.</em></strong> Please note that my thoughts on Fabric are only intended as constructive criticism. Additionally,
some features are still in Preview and are subject to change.</p>
<h3 id="make-it-reproducible">Make It Reproducible</h3>
<p><em>Make it reproducible: Reproducible results are required and
therefore <strong>we version</strong> everything: data, low-level hardware and
software configurations, and the code and configuration specific to each tool in the toolchain.</em></p>
<p>This principle emphasizes version control. As I explained in my <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part30/" target="_blank">DataOps 101 sessions</a>, version control is the first step to making analytic projects more successful. Fabric offers two features that promote reproducibility:</p>
<p><strong>1) <a href="https://learn.microsoft.com/en-us/power-bi/developer/projects/projects-overview" target="_blank">The Power BI Desktop Project</a> (PBIP) File</strong></p>
<p>This <a href="https://learn.microsoft.com/en-us/power-bi/developer/projects/projects-overview" target="_blank">newly introduced
format</a> moves from the traditional binary file (PBIX) to a text-based format. This transition reduces the barrier to Git adoption, eliminating concerns about binaries containing confidential data in repositories. It also reduces reliance on using the cumbersome <a href="https://git-lfs.com/" target="_blank">GIT Large File Storage</a> feature to prevent repository bloat.</p>
<p>The PBIP file format boosts transparency, particularly in addressing the question, “What did you change to the Power BI Report or Dataset (now Semantic Model)”? However, there is still room for improvement. PBIP cannot currently save models in the Tabular Model Definition Language (TMDL) format, a feature that would streamline processes for comparing changes. Additionally, as illustrated in <strong>Figure 1</strong>, understanding the JSON files for the report side of PBIP is like deciphering code from the Matrix (I may be dating myself here). Despite these considerations, PBIP
is a fantastic feature within the Fabric and Power BI ecosystem,
promoting an accessible and collaborative analytics workflow.</p>
<p><img src="/assets/img/posts/fabric-nov-2023/Figure1.png" alt="Figure 1" class="center-image" /></p>
<p><em class="center-text-figure">Figure 1 - Comparing changes in reports is still a challenging task.</em></p>
<p><strong>2) Git Integration</strong></p>
<p>Introducing <a href="https://learn.microsoft.com/en-us/fabric/cicd/git-integration/intro-to-git-integration#considerations-and-limitations" target="_blank">Git Integration</a> to Fabric workspaces lessens the learning curve for those unfamiliar with Git intricacies. <strong>Figure 2</strong> illustrates its user-friendly interface. Git Integration has expanded to support <a href="https://learn.microsoft.com/en-us/fabric/cicd/git-integration/intro-to-git-integration#supported-items" target="_blank">several artifacts</a>, including datasets, reports, and Notebooks. Hopefully, Microsoft will work toward addressing Dataflows as well. Until then, <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part33/" target="_blank">Part
33</a> of my series outlines the options for managing Gen1 dataflows.</p>
<p><img src="/assets/img/posts/fabric-nov-2023/Figure2.png" alt="Figure 2" class="center-image" /></p>
<p><em class="center-text-figure">Figure 2 - Git Integration provides an easier interface for saving
changes to version control.</em></p>
<p><strong>3) OneDrive Integration with Power BI Desktop</strong></p>
<p>I did say there were <em>two</em> profound features for managing version
control, but it’s also worth acknowledging that Fabric integrates <a href="https://learn.microsoft.com/en-us/power-bi/create-reports/desktop-sharepoint-save-share" target="_blank">Power BI Desktop files with OneDrive</a>. This integration is especially beneficial for smaller teams or those without access to Azure DevOps. OneDrive’s version control feature enables rollbacks to previous timestamps, providing a “better than nothing” option and layer of quality assurance.</p>
<p>The recent changes to the Power BI Desktop, illustrated in <strong>Figure 3</strong>, and the syncing options for Power BI reports and <a href="https://learn.microsoft.com/en-us/power-bi/connect-data/service-datasets-rename" target="_blank">semantic models</a> indicate a step in the right direction. However, I will say that the syncing option still occasionally poses challenges for my team. Issues typically arise when the dataset sync doesn’t occur automatically, or when <a href="https://community.fabric.microsoft.com/t5/Service/OneDrive-sync-not-working-with-quot-thin-quot-report/m-p/3383465" target="_blank">a report suddenly stops
syncing</a> and there is no option to resync without creating a new file. This error
typically results in broken URLs. I hope Microsoft eliminates both
inconveniences soon.</p>
<p><img src="/assets/img/posts/fabric-nov-2023/Figure3.png" alt="Figure 3" class="center-image" /></p>
<p><em class="center-text-figure">Figure 3 - The latest version of Power BI Desktop offers an upgraded
OneDrive integration.</em></p>
<h3 id="quality-is-paramount">Quality is Paramount</h3>
<p><em>Analytic pipelines should be built with a foundation capable
of <strong>automated detection</strong> of abnormalities and security issues in code,
configuration, and data, and should provide continuous feedback to
operators for error avoidance.</em></p>
<p>Test, test, and test. This principle emphasizes the importance of
implementing testing regimes for semantic models, reports, dataflows, and other artifacts in Fabric. Questions regarding the correctness of columns’ formats (e.g., string, integer), their alignment with Regex expressions (e.g., email addresses, phone numbers), and the expected number of rows (e.g., a date dimension containing today’s date) should all undergo automated checks. This robust testing serves as a safety net, mitigating the risk of introducing errors into the production environment.</p>
<p>At the time of writing, there isn’t a feature <a href="https://www.kerski.tech/how-dataops-is-woven-into-the-microsoft-fabric/native" target="_blank">native</a> in Power BI or Fabric designed for instilling testing. As a result, many users resort to developing bespoke solutions, including <a href="https://www.linkedin.com/feed/update/urn:li:activity:7064849591459835904/" target="_blank">Flávio Meneses</a> and <a href="https://github.com/kerski/pbi-dataops-template/blob/part25/documentation/run-tests.md" target="_blank">my own</a> attempt. That said, Microsoft does provide the foundational elements for building repeatable testing frameworks, including:</p>
<p><strong>1) The Notebook Testing Framework</strong></p>
<p>Introducing <a href="https://learn.microsoft.com/en-us/fabric/data-engineering/how-to-use-notebook" target="_blank">Notebooks</a>
to the Fabric toolbox brings Python and its extensive library support into the testing realm for data pipelines. For example, Great Expectations (one of my favorites), has incorporated new functions tailored to <a href="https://blog.fabric.microsoft.com/en-us/blog/semantic-link-data-validation-using-great-expectations?ft=All" target="_blank">support Fabric integration</a>.
The new <a href="https://learn.microsoft.com/en-us/fabric/data-science/semantic-link-overview#power-bi-connectivity" target="_blank">Sempy Python library</a> also adds flexibility to test datasets and semantic models. Leveraging these capabilities, I’ve been exploring a concept, illustrated in <strong>Figure 4,</strong> to support a native testing framework within Fabric.</p>
<p><img src="/assets/img/posts/fabric-nov-2023/Figure4.png" alt="Figure 4" class="center-image" /></p>
<p><em class="center-text-figure">Figure 4 - Using Notebooks can establish a testing framework for data pipelines.</em></p>
<p>Notebooks function as the conduit for testing the outputs of each stage of a data pipeline, and they offer the capability to save the test results in storage. <strong>Figure 5</strong> provides an example of what I currently use to test <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part27/" target="_blank">my custom Power Query functions designed for retrieving
SharePoint data</a>.</p>
<p><img src="/assets/img/posts/fabric-nov-2023/Figure5.png" alt="Figure 5" class="center-image" /></p>
<p><em class="center-text-figure">Figure 5 - With Notebooks, we can test the results of data pipeline execution.</em></p>
<p><strong>2) Data Quality Framework</strong></p>
<p>One significant risk to any analytics project is compromised data
quality. While we don’t always have control over the quality of upstream data in source systems, we do bear the brunt of blame when it doesn’t look right in the reports. Consequently, it’s imperative to not only test the quality of the code we develop but to scrutinize the quality of the data itself and promptly alert stakeholders of any issues.</p>
<p><strong>Figure 6</strong> illustrates a concept I’ve been implementing in some of my projects. The outputs of each transformation (in this example, a dataflow) generate data and Data Quality Checks. For example, I may want to output rows from an upstream source with invalid column values crucial to downstream reports. These quality checks are then combined to create the foundation for a Data Quality dataset, which can produce a Data Quality dashboard. This approach maintains data quality while facilitating transparency and awareness among stakeholders.</p>
<p><img src="/assets/img/posts/fabric-nov-2023/Figure6.png" alt="Figure 6" class="center-image" /></p>
<p><em class="center-text-figure">Figure 6 - Data Quality Framework with Data Activator concept.</em></p>
<p><strong>3) DAX Query View Testing</strong></p>
<p>The November release of Power BI Desktop introduced the <a href="https://powerbi.microsoft.com/en-us/blog/power-bi-november-2023-feature-summary/#post-25061-_Toc150157949" target="_blank">DAX Query View</a>, which enables the pairing of DAX queries with datasets and semantic models. Saving DAX queries with the PBIP format stores them under a subfolder labeled DAXQueries (illustrated in <strong>Figure 7</strong>).</p>
<p><img src="/assets/img/posts/fabric-nov-2023/Figure7.png" alt="Figure 7" class="center-image" /></p>
<p><em class="center-text-figure">Figure 7 - Save DAX-based tests with semantic models using DAX Query View.</em></p>
<p>This capability allows us to build DAX-based tests (illustrated in
<strong>Figure 8</strong>), a practice I have advocated for since <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part4/" target="_blank">Part 4</a> of my series. It also supports version control of our DAX-based tests.</p>
<p><img src="/assets/img/posts/fabric-nov-2023/Figure8.png" alt="Figure 8" class="center-image" /></p>
<p><em class="center-text-figure">Figure 8 - An example of a DAX-based test and the output.</em></p>
<p>With so many options for testing, the question is no longer whether it’s possible to test data pipelines but <em>how</em> to test them. The consultant’s answer will be, “<strong>It’s depends</strong>,” but deciding not to test is no longer a viable choice (well done, Microsoft). I’ll expand on the testing in Fabric in future blog articles.</p>
<h3 id="monitor-for-quality-and-performance">Monitor for Quality and Performance</h3>
<p><em>Our goal is to have performance, security, and quality measures that are <strong>monitored continuously</strong> to detect unexpected variations and generate operational statistics.</em></p>
<p>This principle underscores the importance of treating data pipelines as a manufacturing line, emphasizing the need to check the status at every step in the process <strong><em>to catch errors before they reach the customer.</em></strong> While we’ve historically relied on closely monitoring emails for failures or using an
external <a href="https://github.com/kerski/pbi-dataops-monitoring" target="_blank">solution</a>,
Microsoft is moving closer to providing out-of-the-box monitoring with the following features.</p>
<p><strong>1) Monitoring Hub</strong></p>
<p>A centralized station, the Monitoring Hub enables viewing and tracking activities across various products. For new Fabric artifacts such as Notebooks, it provides granular information on the health and operation specifics. However, the information is still quite high-level for datasets and dataflows. I’m hopeful for future enhancements, including the ability to create custom issues based on conditions.</p>
<p><img src="/assets/img/posts/fabric-nov-2023/Figure9.png" alt="Figure 9" class="center-image" /></p>
<p><em class="center-text-figure">Figure 9 - The Monitoring Hub is a good start for reviewing the health of data pipelines.</em></p>
<p>Additionally, I think the Monitoring Hub should allow workspace
administrators to see common issues that arise, such as:</p>
<ul>
<li>
<p>Datasets and Dataflows with disabled schedules</p>
</li>
<li>
<p>Datasets and Dataflows with conflicting schedules</p>
</li>
<li>
<p>Apps with pending access requests</p>
</li>
<li>
<p>Datasets and Dataflows with invalidated credentials</p>
</li>
<li>
<p>Inactive accounts for users who have not visited an app or workspace
in 30 days</p>
</li>
</ul>
<p><strong>2) <a href="https://learn.microsoft.com/en-us/fabric/data-activator/data-activator-introduction" target="_blank">Data Activator</a></strong></p>
<p>Currently in Public Preview, this feature enables the creation of
triggers based on data from a Power BI visual or Event Stream. To
effectively monitor quality and performance from a Power BI perspective, it’s essential to build a Data Quality Dashboard (similar to the one I offered in the previous section) and incorporate visuals that depict that quality. This approach adopts a low-code methodology, and I’m optimistic that it will empower individuals to conduct data quality and performance testing. However, it’s crucial to plan for Data Activator and architect a custom dashboard to support comprehensive monitoring.</p>
<h3 id="orchestrate">Orchestrate</h3>
<p><em>The beginning-to-end orchestration of data, tools, code, environments, and the analytic team’s work is a key driver of analytic success.</em></p>
<p>Automating orchestration through Continuous Integration and Continuous Deployment (CI/CD) is essential for delivering analytics to customers swiftly while mitigating the risks of errors. Microsoft lays the foundation for foundation for orchestration with the following features:</p>
<p><strong>1) Git Integration to Azure DevOps</strong></p>
<p>Git integration with Azure DevOps enables <a href="https://learn.microsoft.com/en-us/power-bi/developer/projects/projects-build-pipelines?WT.mc_id=DP-MVP-5004032" target="_blank">seamless pipeline kickoffs</a>, inspiring innovative ideas and the development of third-party tools to further enhance this experience.</p>
<p><strong>2) Deployment Pipelines</strong></p>
<p>Deployment Pipelines enable teams to reduce the overhead of the
promotion process, decreasing cycle times.
<a href="https://learn.microsoft.com/en-us/fabric/cicd/deployment-pipelines/understand-the-deployment-process#auto-binding" target="_blank">Auto-binding</a> and <a href="https://learn.microsoft.com/en-us/fabric/cicd/deployment-pipelines/create-rules" target="_blank">deployment rules</a> mitigate the risk of missing promotion steps that cause errors and needless support calls (i.e., data is pointing to test, and your customer notices).</p>
<p>As most Fabric artifacts (with a few <a href="https://learn.microsoft.com/en-us/fabric/cicd/deployment-pipelines/understand-the-deployment-process#supported-items" target="_blank">exceptions</a>) can now be promoted, there is a growing need to address the <a href="https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=c348e92e-1eda-ed11-9139-281878ded556" target="_blank">lack of visibility to deployment rules for non-artifact
owners</a>, I hope to see Microsoft offer an API endpoint to view all the deployment rules and help manage the <a href="https://docs.datakitchen.io/articles/#!dataops-concepts/parameterize-your-processing" target="_blank">parameterization of our orchestration efforts</a>, a critical concept in DataOps.</p>
<p>I would also like to see <a href="https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=311d9061-bf8a-ee11-a81c-6045bdba0bac" target="_blank">Microsoft transform deployment rules into a
standalone artifact</a> integrated with Git to support the concept of Infrastructure as code (IaC), which <a href="https://learn.microsoft.com/en-us/devops/deliver/what-is-infrastructure-as-code" target="_blank">avoids manual configuration to enforce
consistency</a>. For medium-to-large Fabric implementations, the number of rules to manage and review quickly becomes unwieldy. Furthermore, a lack of awareness about differences, such as a lack of a rule defined in production between workspaces, poses a significant risk. Introducing deployment rules as a standalone artifact integrated with Git would go a long way in mitigating these risks and providing a more streamlined process for rule management.</p>
<p><strong>3) Fabric API</strong></p>
<p>Similar to what Power BI REST APIs offered for orchestration, the
pending <a href="https://learn.microsoft.com/en-us/rest/api/fabric/" target="_blank">Fabric [API
endpoints]</a> should provide extended orchestration capabilities in Azure pipelines. As Microsoft has been continuously releasing new endpoints, I’m waiting (impatiently) to see what Microsoft will offer in the coming months. One area of improvement is the lack of API capabilities for Apps. These capabilities, including <a href="https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=f0062c0a-de8f-40d6-b12e-560ea7cce009" target="_blank">the ability to publish an App via the API</a>, are crucial to fully orchestrate the promotion process. Microsoft should consider providing API endpoints for any feature in General Availability, embracing the DataOps concept of Orchestration.</p>
<h3 id="reduce-heroism">Reduce Heroism</h3>
<p><em>As the pace and breadth of the need for analytic insights ever
increase, we believe analytic teams should strive to reduce heroism and create sustainable and scalable data analytic teams and processes.</em></p>
<p>In my own words, this means avoiding burnout for you and your team. The litany of changes coming with Microsoft Fabric and Power BI can be overwhelming, but <strong>you don’t need to know everything all at once—don’t be the hero</strong>. Fabric just went Generally Available to Commercial customers and for customers in the Government Community Cloud (GCC) area is unavailable at the time of writing. If you haven’t taken advantage of it in the commercial sector, Fabric <a href="https://learn.microsoft.com/en-us/fabric/get-started/fabric-trial#start-the-fabric-preview-trial" target="_blank">has a free trial feature</a> for prototyping and experimentation.</p>
<p>If you lead a data analytics team, my advice is to invest time in
learning the fundamentals of Microsoft Fabric through <a href="https://learn.microsoft.com/en-us/training/paths/get-started-fabric/" target="_blank">free Microsoft
training</a>. Also, make sure to bookmark <a href="https://fabric.guru/" target="_blank">Sandeep Pawar’s blog</a> and <a href="https://www.kevinrchant.com/" target="_blank">Kevin Chant’s blog</a>, as both describe the Fabric ecosystem in-depth. Be sure to empower team members to experiment with certain features, share findings, and discuss ways to enhance productivity. For guidance on determining which features to investigate, review <a href="https://www.linkedin.com/in/kurtbuhler/" target="_blank">Kurt
Buhler</a>’s <a href="https://data-goblins.com/power-bi/fabric-announcements" target="_blank">beautiful
infographic</a>.</p>
<p>As Fabric evolves, these are exciting times for embracing DataOps and implementing its proven principles. I’d like to hear your thoughts, so please let me know what you think on <a href="https://www.linkedin.com/in/john-kerski-41a697100" target="_blank">LinkedIn</a> or <a href="https://twitter.com/jkerski" target="_blank">Twitter</a>.</p>
<p><em>This article was edited by my colleague and senior technical
writer, <a href="https://www.linkedin.com/in/kiley-williams-garrett-9a16a618a/" target="_blank">Kiley Williams
Garrett</a>.</em></p>John KerskiRevisiting Fabric: DataOps Principles in the Newest Microsoft Analytics SolutionPart 34: Bringing DataOps to Power BI2023-11-01T00:00:00+00:002023-11-01T00:00:00+00:00https://www.kerski.tech//bringing-dataops-to-power-bi-part34<h2 id="bringing-quality-is-paramount-for-gen1-dataflows">Bringing “Quality is Paramount” for Gen1 Dataflows</h2>
<p>Continuing from <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part33/" target="_blank">Part 33</a>, where I introduced the polling method for implementing version control in Power BI dataflows, let’s talk about testing.</p>
<p>Now, if you are following a Medallion Architecture approach, you SHOULD be able to test both the inputs and outputs of your dataflows. This testing is critical to validating each step of the data journey. Testing serves as the safety net for your teams from making regression errors and helps you identify issues before your customers do, both risks that teams should vehemently avoid. However, as of October 2023, Microsoft still doesn’t provide native data testing tools. As I described in <a href="https://www.kerski.tech/how-dataops-is-woven-into-the-microsoft-fabric/" target="_blank">How DataOps is woven into Microsoft Fabric</a>, these capabilities are maturing but not ready for production use while in preview.</p>
<p>Therefore, when <a href="https://www.linkedin.com/in/calvin-barker-61b079270/" target="_blank">Luke Barker</a> and I had only Gen 1 Dataflows available to our projects, we had to get creative. As seasoned data practitioners, we knew we needed to be able to:</p>
<p>1) <strong>Build tests that evaluate the schema of the dataflow outputs</strong> - It is very easy to remove a step in the Power Query editor that casts a column to a specific type (e.g., whole number, date), which can inadvertently introduce problems downstream. In addition, upon saving a dataflow, if Power BI encounters an ambiguous column (e.g., ABC123 columns), it casts that column to a string, removing any errors. This fun “feature” may magically remove rows of data. We needed to be able to inspect the schema of a dataflow table and make sure it matched our expectations.</p>
<p>2) <strong>Build tests that evaluate the content of the dataflow outputs</strong> - If you’re not using Bring Your Own Storage (BYOS) or can’t (looking at you, GCC), obtaining the contents of a dataflow table and inspecting for anomalies is not easy. The data is stored in CSV files behind some gated Microsoft service, so the only way to access that data is via a Power BI dataset.</p>
<p>3) <strong>Build and run tests locally</strong> - Good testing systems allow staff to run tests independently from one another and store the tests in version control. That way, it’s easy to share tests with teams, and tests can be developed and tested (i.e., testing the tests) locally. Ultimately, this capability allows teams to create a suite of tests and set up the safety net needed.</p>
<p>4) <strong>Keep it Simple</strong> – Microsoft doesn’t always make this easy, but we didn’t want to have to maintain lots of components or scripts just to run some tests. The overhead of testing should not impede the delivery of new features on our projects.</p>
<h3 id="the-unpivoted-testing-approach">The UnPivoted Testing Approach</h3>
<p>After some research, we settled on a Dataflow Testing Dataset concept that would sit in the same workspace as the dataflow. This dataset would exist for each team member who needed to write and execute tests. Most importantly, this dataset could connect to any dataflow and pull the contents for testing.</p>
<p>How do you do that? Well, I’m glad you asked. It works this way:</p>
<p><strong>The Parameters</strong> - The dataset has a parameter for the Workspace (Workspace_ID) and Dataflow (Dataflow_ID) so it knows which dataflow to import. We also needed a randomly generated GUID to represent the instance for test execution (Run_ID). This allows us to track when a suite of tests is executed and their results.</p>
<p><strong>The Tables</strong> - The dataset has two tables. The first and largest is called <em>DFTest</em>. It functions as an unpivoted representation of each table in the dataflow. <strong>Figure 1</strong> illustrates the approach. This technique avoids running into schema refresh errors in a dataset when switching between dataflows and standardizes the format. Yes, it does drastically increase the number of rows, but if I’m testing more than 10 million rows of data in development, I should be parameterizing my dataflows to keep the size more manageable during development.</p>
<p><img src="/assets/img/posts/part34/Figure1.png" alt="Figure 1" class="center-image" /></p>
<p><em class="center-text-figure">Figure 1 - Switching between dataflow sources dynamically allows us to have one testing dataset.</em></p>
<p>The second table is <em>RowCount</em>, which contains the total row contents for each table in the dataflow. If you have ever saved a dataflow and forgot to refresh, then you know that you could easily have zero rows of data and not know why. This table makes it easy to test for that oversight.</p>
<h3 id="local-testing">Local Testing</h3>
<p>With the dataset ready to serve our testing needs, we bring in our <a href="https://powershellexplained.com/2017-03-17-Powershell-Gherkin-specification-validation/" target="_blank">good ol’ friend Pester 4 and build Gherkin tests</a>. These tests are housed in the same repository that tracks dataflow changes via the <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part33/" target="_blank">polling method</a>. Gherkin is an implementation of the <a href="https://www.agilealliance.org/glossary/bdd/" target="_blank">Behavior Driven Development</a> approach that emphasizes building tests in plain language and executing those tests with scripts. That way, you have a set of syntax and semantics to set expectations for the test setup and the tests conducted. Each test is described as a feature file with a Background section that validates the environment to conduct the test and then the Scenario that executes those tests.</p>
<p><img src="/assets/img/posts/part34/Figure2.png" alt="Figure 2" class="center-image" /></p>
<p><em class="center-text-figure">Figure 2 - Example Background Section of a Gherkin test.</em></p>
<p>In more detail, the Background section executes a shared script (Test-Support.ps1), which interfaces with the Dataflow Testing Dataset as illustrated in <strong>Figure 3</strong>. This process works as described line by line for each sentence of the test:</p>
<h4 id="given-that-we-have-access-to-the-dftest-file-in-the-workspace-workspace"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#given-that-we-have-access-to-the-dftest-file-in-the-workspace-workspace">Given “that we have access to the DFTest file in the Workspace: ‘{Workspace}’”</a></h4>
<p>This test verifies that the workspace is accessible, and the appropriate testing dataset exists in the workspace.</p>
<h4 id="and-we-have-access-to-the-dataflow-dataflowname"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#and-we-have-access-to-the-dataflow-dataflowname">And “we have access to the Dataflow: ‘DataflowName’”</a></h4>
<p>This test verifies that the dataflow exists and extracts the contents of the dataflow (as JSON).</p>
<h4 id="and-we-have-the-table-called-table"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#and-we-have-the-table-called-table">And “we have the table called ‘Table’”</a></h4>
<p>This test verifies that the table exists in the dataflow (and wasn’t renamed for some reason).</p>
<h4 id="and-we-can-setup-the-table-for-testing"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#and-we-can-setup-the-table-for-testing">And “we can setup the table for testing”</a></h4>
<p>This is the most critical step in testing because this test:</p>
<ul>
<li>Updates the parameters of the testing dataset to point to the appropriate workspace and dataflow.</li>
<li>Issues a synchronous dataset refresh and tests if the dataset successfully refreshes. This refresh request forces the dataset to detect the new parameters.</li>
</ul>
<p><img src="/assets/img/posts/part34/Figure3.png" alt="Figure 3" class="center-image" /></p>
<p><em class="center-text-figure">Figure 3 - Testing process illustrated.</em></p>
<p>With the background validated for testing, we then can execute simple schema and content tests. The following are the ones I use often:</p>
<h4 id="schema-tests"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#schema-tests">Schema Tests</a></h4>
<h5 id="then-it-should-contain-or-match-the-schema-defined-as-follows"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#then-it-should-contain-or-match-the-schema-defined-as-follows">Then “it should {Contain or Match} the schema defined as follows:”</a></h5>
<p>This test accepts a table of information with the columns Name, Type, and Format, such as:</p>
<table>
<thead>
<tr>
<th>Name</th>
<th>Type</th>
<th>Format</th>
</tr>
</thead>
<tbody>
<tr>
<td>Alignment ID</td>
<td>int64</td>
<td>0</td>
</tr>
</tbody>
</table>
<ul>
<li>
<p>Name: This is the column name.</p>
</li>
<li>
<p>Type: This is the column type.</p>
</li>
<li>
<p>Format: This is the column format. You can leave this blank if the format does not need to be tested.</p>
</li>
</ul>
<p>This test accepts a parameter {Contain or Match}. If the parameter entered is “Contain,” then this test will make sure each column exists and matches the type and format. If the parameter entered is “Match,” then this test will ensure the table has all the columns defined in the test, that each column exists, and that each column matches the type and format. The “Match” value is strict so that no new columns exist in the dataset compared to the defined table in the feature file.</p>
<h4 id="content-tests"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#content-tests">Content Tests</a></h4>
<h5 id="and-the-values-of-columnname-matches-this-regex-regex"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#and-the-values-of-columnname-matches-this-regex-regex">And ‘the values of “{ColumnName}” matches this regex: “{Regex}”’</a></h5>
<p>This test accepts the {ColumnName} parameter and {Regex} parameter. This verifies that the column in the table passes the regular expression. The
Regular Expression format follows the <a href="https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expressions">.Net Regular Expressions format</a>.</p>
<h5 id="and-the-values-in-columnname-are-unique"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#and-the-values-in-columnname-are-unique">And “the values in ‘{ColumnName}’ are unique’”</a></h5>
<p>This test accepts the {ColumnName} parameter and validates that are values in that column are unique.</p>
<h5 id="and-there-should-be-comparison-than-count-records-returned"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#and-there-should-be-comparison-than-count-records-returned">And “there should be {Comparison} than {Count} records returned”</a></h5>
<p>This test accepts the {Comparison} parameter and {Count} parameter. The {Comparison} parameter can be the following values:</p>
<ul>
<li>exactly</li>
<li>less than</li>
<li>less than or equal to</li>
<li>greater than</li>
<li>greater than or equal to</li>
</ul>
<p>The {Count} parameter should be a number.</p>
<p>This test makes sure the number of records in the table meets expectations. This is a good test to monitor for empty tables and residual test filters.</p>
<h5 id="and-all-tests-should-pass-for-the-dax-query-test-file"><a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/Documentation/run-tests.md#and-all-tests-should-pass-for-the-dax-query-test-file">And “all tests should pass for the DAX query: {Test File}”</a></h5>
<p>This test accepts the {Test File} parameter. The {Test File} parameter is the name of the Data Analysis Expressions (DAX) file in the dataflows
folder.</p>
<p>This test executes the DAX query against the test dataset and inspects the test results returned from the DAX Query.</p>
<p>The DAX query needs to output the following schema:</p>
<table>
<thead>
<tr>
<th><strong>Test</strong></th>
<th><strong>Expected Value</strong></th>
<th><strong>Actual Value</strong></th>
<th><strong>Passed</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Text describing the test.</td>
<td>The expected value in the appropriate format (e.g., number, boolean)</td>
<td>The actual value of the DAX calculation.</td>
<td>A boolean indicated true if the test passed. Otherwise, the value is false.</td>
</tr>
</tbody>
</table>
<h3 id="generating-tests">Generating Tests</h3>
<p>Now, to keep it simple, two PowerShell scripts aid in testing. The first script, Generate-DFTests.ps1, builds a feature file based on a template for testing a dataflow (see <strong>Figure 4</strong>).</p>
<p><img src="/assets/img/posts/part34/Figure4.png" alt="Figure 4" class="center-image" /></p>
<p><em class="center-text-figure">Figure 4 - Example of generating a test.</em></p>
<p>The second script, Run-DFtests.ps1, runs the tests based on which dataflow you wish to test and displays the results (see <strong>Figure 5</strong>).</p>
<p><img src="/assets/img/posts/part34/Figure5.png" alt="Figure 5" class="center-image" /></p>
<p><em class="center-text-figure">Figure 5 - Example of running the tests for dataflows.</em></p>
<p>In the end, the folder structure for our dataflow versions and tests follows the pattern in <strong>Figure 6</strong>. Each dataflow gets its own folder, and tests are kept in a sub-folder called CI/{Workspace ID}. {Workspace ID} is the Globally Unique Identifier (GUID) representing the Power BI workspace that stores the dataflow.</p>
<p><img src="/assets/img/posts/part34/Figure6.png" alt="Figure 6" class="center-image" /></p>
<p><em class="center-text-figure">Figure 6 - Pattern for storing versions of Power BI dataflows and tests.</em></p>
<p>All right, that seems like a lot to go over, but when you get past the initial setup, maintaining the tests and executing them is fairly quick. In fact, thanks to ChatGPT, generating the Regex expression to test the contents of a column is much easier than it was two years ago.</p>
<p>This method has extended my teams’ safety net way past <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part23/" target="_blank">testing datasets</a>. Now, we can test dataflows when Bring Your Own Storage and/or Fabric is not an option.</p>
<h3 id="try-it-for-yourself">Try It For Yourself</h3>
<p>To try this approach, please visit the <a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method" target="_blank">template GitHub repository</a> for an installation script to create a Power BI workspace and an Azure DevOps project. This script will set up the workspace and project. Then, all you need to do is follow the instructions for uploading the Dataflow Testing Dataset to the workspace and get to testing!</p>
<p>If you’ll be at <a href="https://365educon.com/Chicago/">EduCon 365 Chicago</a>, please stop by my session on November 2nd, where I go in-depth on dataflow version control and testing.</p>
<p>Can you guess what my next article will be on? As always, let me know what you think on <a href="https://www.linkedin.com/in/john-kerski-41a697100" target="_blank">LinkedIn</a> or <a href="https://twitter.com/jkerski" target="_blank">Twitter (having a hard time calling it X)</a></p>
<p><i>This article was edited by my colleague and senior technical writer, <a href="https://www.linkedin.com/in/kiley-williams-garrett-9a16a618a/" target="_blank">Kiley Williams Garrett</a>.</i></p>John KerskiBringing “Quality is Paramount” for Gen1 DataflowsPart 33: Bringing DataOps to Power BI2023-10-10T00:00:00+00:002023-10-10T00:00:00+00:00https://www.kerski.tech//bringing-dataops-to-power-bi-part33<h2 id="make-it-reproducible-for-gen1-dataflows">Make It Reproducible… for Gen1 Dataflows</h2>
<p>While I look forward to all the new capabilities <a href="https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/05-dataflows-gen2.html" target="_blank">Gen2 dataflows</a> offer, it may be a while before they are available to the wider audiences I work with. It took nearly two years for Gen1 dataflows to come to the Government Community Cloud (GCC) if my aging memory serves me right. Furthermore, even if the pricing becomes cost-competitive, not every client will be prepared to factor the cost of purchasing Fabric into their current solutions.</p>
<p>Fortunately, Gen 1 dataflows still represent a viable production option, particularly for those interested in implementing a <a href="https://www.databricks.com/glossary/medallion-architecture" target="_blank">Medallion architecture</a> within the Power BI Service. Roughly a year ago, I offered a Power BI-based solution for the <a href="https://www.kerski.tech/bringing-dataops-to-power-bi-part22/" target="_blank">version control, testing, and orchestration of Gen1 dataflows</a> using the <a href="https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-azure-data-lake-storage-integration" target="_blank">Bring Your Own Storage</a> (BYOS) feature. However, BYOS is not always an option (GCC users definitely don’t have this option as of Fall 2023).</p>
<p>To improve cycle times and reduce errors in environments where the BYOS solution was not feasible, I explored some options. Thankfully, I had a version control tool in Azure DevOps, so I got to work developing a strategy for backing up the dataflows my teams were working on.</p>
<h2 id="the-polling-method">The Polling Method</h2>
<p>Without BYOS, I knew that capturing every single change to the dataflow was impossible. However, I could poll for changes at regular intervals and commit those changes to an <a href="https://learn.microsoft.com/en-us/azure/devops/repos/get-started/what-is-repos?view=azure-devops" target="_blank">Azure Repo</a> (see Figure 1). That way, if I needed to recover a past version or a deleted dataflow, my teams would not experience significant setbacks.</p>
<p><img src="/assets/img/posts/part33/Figure1.png" alt="Figure 1" class="center-image" /></p>
<p><em class="center-text-figure">Figure 1 - Illustrates the high-level polling process and components used to backup dataflows to Git.</em>
<br />
This polling process is driven by two YAML files:</p>
<p>1) <strong>Dataflow-Polling-and-Backup-Schedule-Dev</strong> - This file runs on a scheduled interval and kicks off the second YAML file. The setup will prevent an infinite loop from occurring in future iterations of this solution when we want a branch update to trigger a continuous integration (CI) process (hint: it involves my favorite topic).</p>
<p>2) <strong>Dataflow-Polling-and-Backup</strong> - This file runs the Start-DataflowBackup.ps1 process, which includes:</p>
<ul>
<li>
<p>Loading and installing the appropriate PowerShell modules.</p>
</li>
<li>
<p>Logging into the Power BI Service using an account (see security notes).</p>
</li>
<li>
<p>Retrieving the list of Power BI dataflows.</p>
</li>
<li>
<p>Checking if the Azure Repo (I also use the term repo and repository synonymously with Azure Repo) has the dataflow. If not, it adds the dataflow to the repository. If the dataflow does exist, the script inspects the “modifiedTime” properties to review timestamps. If they do not match, we commit a new version to the repository.</p>
</li>
</ul>
<p><img src="/assets/img/posts/part33/Figure2.png" alt="Figure 2" class="center-image" /></p>
<p><em class="center-text-figure">Figure 2 - Illustrates how two YAML files implement the Polling Method.</em></p>
<h3 id="security-notes">Security Notes</h3>
<p>The polling method is dependent on two major security components:</p>
<p>1) A Premium Per User (PPU) account that can log in to the Power BI service and access the workspace housing the dataflows.</p>
<p>2) A Personal Access Token (PAT) that can access the repository storing the dataflows.</p>
<h3 id="restoring-a-dataflow">Restoring a dataflow</h3>
<p>To revert to a prior dataflow, download the JSON file from the repository and import it into your Power BI workspace. If the dataflow already exists, it will append a number to the end of the new one. You’ll have to delete the old one and relink any dependencies…a pain, but at least you have a version to restore. :)</p>
<h2 id="try-it-for-yourself">Try It for Yourself</h2>
<p>To illustrate the polling method in action, I’ve provided <a href="https://github.com/kerski/pbi-dataops-dataflows-polling-method/blob/main/README.md" target="_blank">an installation script on GitHub</a>. This script simplifies the setup by creating a Power BI workspace and Azure DevOps projects and while configuring the appropriate settings to facilitate the process.</p>
<p>If you are going to <a href="https://www.summitna.com/" target="_blank">Community Summit North America</a>, stop by my session on October 18th, where I will go in-depth on dataflow version control and testing. For even more dataflow content, stay tuned for my next article, where I will cover testing dataflows using a technique <a href="https://www.linkedin.com/in/calvin-barker-61b079270/" target="_blank">Luke Barker</a> and I came up with.</p>
<p><i>This article was edited by my colleague and senior technical writer, <a href="https://www.linkedin.com/in/kiley-williams-garrett-9a16a618a/" target="_blank">Kiley Williams Garrett</a>.</i></p>John KerskiMake It Reproducible… for Gen1 DataflowsPart 32: Bringing DataOps to Power BI2023-09-12T00:00:00+00:002023-09-12T00:00:00+00:00https://www.kerski.tech//bringing-dataops-to-power-bi-part32<h2 id="the-data-journey-dont-assume-what-worked-last-week-will-work-today">The Data Journey: Don’t Assume What Worked Last Week Will Work Today</h2>
<p>Microsoft technologies and application programming interfaces (APIs) are
rapidly evolving, introducing new beneficial features and improvements.
However, changes to Power BI or Graph APIs can also affect backend data
operations, including the operation of custom connectors like the ones I’ve authored for the
<a href="https://github.com/kerski/powerquery-connector-pbi-rest-api-commercial" target="_blank">Power BI REST
API</a>
and
<a href="https://github.com/kerski/powerquery-connector-ms-planner-rest-api" target="_blank">Planner API</a>.
Over the past year, I’ve seen production systems rely on custom connectors, so we can’t assume what worked before an Microsoft update will work after.</p>
<p>The principles of DataOps emphasize the importance of evaluating the
data’s journey through the assembly line to identify potential points of
failure. A proactive approach ensures that you can respond quickly to
changes in the broader IT ecosystem without disrupting operations for
users and customers. With many of my production environments using
custom connectors, I wanted to set up continuous testing to identify
potential failures. But there were a few challenges:</p>
<ul>
<li>
<p><strong>Decoupling test environment from testing</strong> - Many connectors
require specific details about my tenant and workspaces that I would
not want to publish in a public GitHub repository. To ensure the
custom connector code and testing code were accessible to external
users while safeguarding sensitive data, I needed a way to easily
separate the test configurations and remove the globally universal
unique identifiers (GUID) only available through my tenant.</p>
</li>
<li>
<p><strong>Automating Authentication</strong> - Several of the custom connectors I
wrote, both independently as well as in collaboration with <a href="https://www.linkedin.com/in/calvin-barker-61b079270/" target="_blank">Luke
Barker</a>, use
Azure Active Directory (AD) Authorization. To facilitate testing, I
needed to use a service account. However, the <a href="https://marketplace.visualstudio.com/items?itemName=PowerQuery.vscode-powerquery-sdk" target="_blank">Power Query Software
Development Kit (SDK) for Virtual Studio (VS)
Code</a>,
semi-obfuscated this process. The SDK stored credentials in a way
that didn’t expose each step in the terminal (probably for good
security reasons).</p>
</li>
<li>
<p><strong>Deploying</strong> – In cases where the custom connector passed
automated tests, I needed to automatically remove any testing
variables associated with my tenant. This measure would safeguard
variables (e.g., passwords, secrets, tenant IDs) from being posted
to GitHub.</p>
</li>
<li>
<p><strong>Standardizing Testing</strong> - I also wanted to ensure I could run the
same tests locally with a simple command that incorporated my test
configuration using the same command for running automated tests in
the build pipeline. That way, whether I ran a test locally, the
build pipeline ran it, or another contributor ran it in their
tenant, the execution steps would be the same.</p>
</li>
</ul>
<h3 id="solution">Solution</h3>
<p>After reading the Microsoft documentation and reading <a href="https://bengribaudo.com/blog/2022/10/24/7012/highlights-from-the-new-power-query-sdk" target="_blank">Ben Gribaudo’s
article on the Power Query
SDK</a>,
I started to experiment with tackling each one of these challenges. As
shown in <strong>Figure 1</strong>, I ended up having two repositories:</p>
<p>1) An “internal” repo on Azure DevOps (it could have been a private
repo on GitHub, but it’s hard to break habits) to store the code along
with my test configuration file.</p>
<p>2) A “public” repo to keep the code open-sourced and releases
publicly available. This allowed me to maintain my test configuration in
source control while excluding this component during the deployment
pipeline in Azure DevOps. With the help of <a href="https://learn.microsoft.com/en-us/azure/devops/repos/security/github-advanced-security-secret-scanning?view=azure-devops" target="_blank">Secret
Scanning</a> in Azure DevOps, it also provided an extra layer of protection against
accidentally committing sensitive secrets or passwords to the public
source control.</p>
<p><img src="/assets/img/posts/part32/Figure1.png" alt="Figure 1" class="center-image" /></p>
<p><em class="center-text-figure">Figure 1 - Design for testing, deploying, and sharing custom power query
connectors.</em></p>
<p>Within this high-level design, here is how I tackled each challenge:</p>
<ul>
<li><strong>Decoupling the test environment from testing</strong> – Leveraging the
<i>PQTest.exe test command</i>, you can provide an environment
configuration file as a JSON file with properties and values. I just
needed to tweak the testing file (thanks to <a href="https://learn.microsoft.com/en-us/power-query/handling-unit-testing" target="_blank">the template provided
by
Microsoft</a>)
to reference these properties instead of a hard-coded value
(<strong>Figure 2</strong>). This setup enables others to build their test
configuration and run the tests in their own environment.</li>
</ul>
<p><img src="/assets/img/posts/part32/Figure2.png" alt="Figure 2" class="center-image" /></p>
<p><em class="center-text-figure">Figure 2 - Separate JSON file stores test variables and the feed the
testing scripts.</em></p>
<ul>
<li><strong>Automating Authentication</strong> - This way, by far, the most
challenging aspect and the least documented. It turns out that by
using the <i>generate-credential command</i> and providing your compiled
connector and test file, you can obtain the appropriate JSON file.
Then you need to update the JSON file and add it as an argument when
executing the test command. However, when you do attempt this
process with a file that uses Environment Configuration, the
generate-template does not like it and says you have an error. As a
workaround, I provided the generate-template with a basic test file
to ensure the custom connector loads successfully. This approach
yielded a template file. After some trial and error, I figured out
how to replace the template with the credentials in the format
required to make the incantation work (<strong>Figure 3</strong>).</li>
</ul>
<p><img src="/assets/img/posts/part32/Figure3.png" alt="Figure 3" class="center-image" /></p>
<p><em class="center-text-figure">Figure 3 - Example of how to generate, change, and set the credentials
with PQTest.exe.</em></p>
<ul>
<li><strong>Deployments</strong> – Thanks to Azure DevOps release pipelines and a
few stack overflow searches (ChatGPT was not too helpful this time
), I created a two-step release process. As shown in <strong>Figure 4</strong>,
it first removes the variables.test.json file so my personal test
settings for my tenant do not go public. Second, it pushes the code
to the main branch of the public repository using git command lines.
Note that with the git push, there is a <em>%gt%</em> reference. That is
the <a href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens" target="_blank">Personal Access
Token</a> I had to set up in GitHub to authenticate with the public
repository.</li>
</ul>
<p><img src="/assets/img/posts/part32/Figure4.png" alt="Figure 4" class="center-image" /></p>
<p><em class="center-text-figure">Figure 4 - Steps to release from the internal repo to the public repo.</em></p>
<ul>
<li><strong>Standardize Testing</strong> - To make sure the tests I run locally would
operate similarly, I built a PowerShell script
<a href="https://github.com/kerski/powerquery-connector-pbi-rest-api-commercial/blob/main/CI/Scripts/Run-PQTests.ps1" target="_blank">Run-PQTests.ps1</a>
that performs the following:</li>
</ul>
<ol style="margin-left:20px">
<li>Identify (through an environment variable) whether the script is running locally or running in a build agent</li>
<li>Connect to the Power BI Service</li>
<li>Compile the custom connector with MakePQX.exe</li>
<li>Generate the Credential template with PQTest.exe</li>
<li>Set the Credential with PQTest.exe</li>
<li>Run the Tests with PQTest.exe</li>
<li>Check the results and fail the build if the tests fail</li>
</ol>
<p><strong>Figure 5</strong> demonstrates running the script locally.</p>
<p><img src="/assets/img/posts/part32/Figure5.png" alt="Figure 5" class="center-image" /></p>
<p><em class="center-text-figure">Figure 5 - Example of running tests locally.</em></p>
<p>If you’re interested in diving further into the code, please check out
my repository for the <a href="https://github.com/kerski/powerquery-connector-pbi-rest-api-commercial/tree/main" target="_blank">Power Query Custom Data Connector for Power BI
REST APIs
(Commercial)</a>
and explore the “CI” subfolder to see the scripts, folder structure, and
YAML files. Over the next year, I’ll incorporate more continuous
integration and continuous deployment (CI/CD) into custom connectors I
build, and I hope this helps you on that journey too. <strong>Don’t assume the
custom connectors will continue to work, especially if they depend on
APIs—automate testing.</strong></p>
<p>I’d like to hear your thoughts, so please let me know what you think
on <a href="https://www.linkedin.com/in/john-kerski-41a697100" target="_blank">LinkedIn</a> or <a href="https://twitter.com/jkerski" target="_blank">Twitter</a>.</p>
<p>Finally, if you’re going to be at the <a href="https://www.eventbrite.com/e/sqlsaturday-denver-2023-tickets-566987283227" target="_blank">SQL Saturday
Denver</a>,
September 23rd, 2023, I’ll be presenting on custom connectors. I hope
to see you there!</p>
<p><i>This article was edited by my colleague and senior technical writer, <a href="https://www.linkedin.com/in/kiley-williams-garrett-9a16a618a/" target="_blank">Kiley Williams Garrett</a>.</i></p>John KerskiThe Data Journey: Don’t Assume What Worked Last Week Will Work TodayPart 31: Bringing DataOps to Power BI2023-08-01T00:00:00+00:002023-08-01T00:00:00+00:00https://www.kerski.tech//bringing-dataops-to-power-bi-part31<h2 id="dataops-principle-14---analytics-is-manufacturing">DataOps Principle #14 - Analytics is manufacturing</h2>
<p>Exciting developments are underway for Microsoft users, with the recent
announcements of <a href="https://powerbi.microsoft.com/en-us/blog/deep-dive-into-power-bi-desktop-developer-mode-preview/" target="_blank">Power BI Desktop Developer
Mode</a> in public preview and Fabric’s
<a href="https://www.youtube.com/watch?t=2453&v=wdDx0-jvl7w&feature=youtu.be" target="_blank">roadmap</a>
for Git repository integration within Azure DevOps. These advancements
empower developers with increased flexibility in managing their
dataflows, datasets, notebooks, and other workspace artifacts directly
within the Fabric workspace.</p>
<p>As I’ve discussed in past articles (<a href="https://blog.kerski.tech/bringing-dataops-to-power-bi-part22/" target="_blank">Part
22</a> and
<a href="https://blog.kerski.tech/bringing-dataops-to-power-bi-part25/" target="_blank">Part
25</a>),
integrating source code with Azure DevOps yields significant benefits,
including the ability to trigger pipelines for evaluating, testing, and
deploying code to Power BI workspaces. This integration streamlines
development and deployment processes, enhancing collaboration and
version control for Power BI projects.</p>
<p>However, as Git integration moves into general availability and more
developers begin using <a href="https://learn.microsoft.com/en-us/azure/devops/pipelines/get-started/what-is-azure-pipelines?view=azure-devops" target="_blank">Azure DevOps
pipelines</a>,
monitoring becomes a greater priority.</p>
<p>In DataOps, Analytics <strong><em>is</em></strong> Manufacturing:</p>
<p><em>Analytic pipelines are analogous to lean manufacturing lines. We
believe a fundamental concept of DataOps is a focus on process-thinking
aimed at achieving continuous efficiencies in the manufacture of
analytic insight.</em></p>
<p>When applied to Power BI projects, this means that whether you are
building dataflows to transform data and load it into a data lake or
automating the deployment of a new dataset to a Power BI workspace with
Azure DevOps, each pipeline is analogous to an assembly line that is
creating a product for a customer. Therefore, if we are not actively
monitoring our Azure DevOps pipelines, we introduce a weakness in our
approach for delivering analytic products.</p>
<p>Many things can cause an Azure DevOps pipeline to fail. Here are just a
few issues my teams watch for:</p>
<p>1) <strong>Build Agent Failures</strong>. A build agent is an operating system
specifically configured to run your pipelines. If the configuration
is broken, the build agent can’t run or execute the necessary
pipeline steps.</p>
<p>2) <strong>Step Failures.</strong> Pipelines allow you to run custom PowerShell
code, use secret variables (e.g., passwords), add
<a href="https://learn.microsoft.com/en-us/azure/devops/extend/overview?view=azure-devops" target="_blank">extensions</a>,
and more. A bug in code, an expired password, or a misconfigured
extension could lead to the failure of the pipeline build.</p>
<p>3) <strong>Network Failures.</strong> Pipelines need to communicate with services
like Azure Active Directory (is that called Entra now?) and Power
BI. If the network connection fails or becomes degraded, it may
cause your pipelines to fail.</p>
<p>4) <strong>Schedule Failures</strong>. You can schedule builds to run regularly, as
<a href="https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema/schedules?view=azure-pipelines" target="_blank">defined in
YAML</a>.
However, it has been my experience that Azure DevOps conveniently forgets
that schedule (it’s not a human, but I will anthropomorphize as I
wish). When pipelines don’t run as scheduled, you don’t get a
notification about the failure. A missed schedule failure can create
confusion and frustration for your teams and customers.</p>
<p>These issues can disrupt the analytics manufacturing line and impede the
testing and deployment of Power BI artifacts. So, how do we monitor
them?</p>
<h2 id="the-problem-with-monitoring-azure-devops-pipelines">The Problem with Monitoring Azure DevOps Pipelines</h2>
<p><strong><em>Alright, well, Azure DevOps is a Microsoft product. Surely, they have
a native connector to get pipeline data.</em></strong></p>
<p>If you search for Azure DevOps in the <strong>Get</strong> <strong>Data</strong> Power BI
navigator pane, you’ll see you can only import <a href="https://learn.microsoft.com/en-us/azure/devops/boards/get-started/what-is-azure-boards?view=azure-devops" target="_blank">Azure Board
data</a>.
Only… board… data… oh man.</p>
<p><img src="/assets/img/posts/part31/Figure1.png" alt="Figure 1" class="center-image" />
<i class="center-text-figure">Figure 1 - The native options for importing Azure DevOps data… Boards
only</i></p>
<h2 id="a-potential-solution">A Potential Solution</h2>
<p>Faced with a lack of a connector, I broke
out the trusty Web.Contents function and began building Power Query code
to query the <a href="https://learn.microsoft.com/en-us/rest/api/azure/devops/pipelines/" target="_blank">pipeline application programming interfaces
(APIs)</a> in a Gen1 dataflow, and integrate it with the existing
<a href="https://github.com/kerski/pbi-dataops-monitoring" target="_blank">pbi-dataops-monitoring
template.</a> Why a Gen1
dataflow? Well, I want to be prepared to monitor these issues when
Fabric becomes generally available. When it does, our projects with
Fabric can easily upgrade to Gen2 Dataflows and export the results to a
data lake. Leveraging <a href="https://fabric.guru/power-bi-direct-lake-mode-frequently-asked-questions" target="_blank">Direct Lake
datasets</a>
will enable us to quickly identify and address issues. At the same time,
using a Gen1 Dataflow ensures I can continue monitoring in Pro or
Premium Per User (PPU) environments, providing the best of both worlds.</p>
<h2 id="implementation">Implementation</h2>
<p>Figure 2 provides a screenshot of the queries found within the Gen1
Dataflow, demonstrating the components of its structure:</p>
<div class="container part31">
<div class="column part31 first-column">
<h3>Parameter</h3>
<ul>
<li><strong>PipelineIssueID & FailedScheduledPipelineIssueID</strong>: The unique
identifier used in the "Issues Table" in the
<a href="https://github.com/kerski/pbi-dataops-monitoring" target="_blank">pbi-dataops-monitoring
template.</a>
</li>
<li><strong>AzureDevOpsBaseURL</strong>: The URL to the project hosting your Azure
pipelines.
</li>
</ul>
<h3>Functions</h3>
<ul>
<li><strong>fnGetPipelinesInProject</strong>: This custom function serves as a
wrapper for the <a href="https://learn.microsoft.com/en-us/rest/api/azure/devops/pipelines/pipelines/list" target="_blank">Pipelines --
List</a>
endpoint.
</li>
<li><strong>fnGetPipelineRunsInProject</strong>: This custom function serves as a
wrapper for the <a href="https://learn.microsoft.com/en-us/rest/api/azure/devops/pipelines/runs/get" target="_blank">Pipelines --
Runs</a>
endpoint.
</li>
</ul>
</div>
<div class="column part31 second-column">
<img src="/assets/img/posts/part31/Figure2.png" alt="Figure 2" class="center-image" />
<i class="center-text-figure">Figure 2 - Queries found in the Gen1 Dataflow</i>
</div>
</div>
<h3>Bronze</h3>
<ul><li><strong>Pipelines In Projects -- Intermediate</strong>: This table calls the
"fnGetPipelinesInProject" function used in other bronze group
tables.
</li>
<li><strong>Pipelines In Project</strong>: This table contains all pipelines associated with the project.
</li>
<li><strong>Pipelines Runs</strong>: This table calls the "fnGetPipelineRunsInProject" function and returns the history of available pipeline runs.
</li>
<li><strong>Latest Pipeline Runs</strong>: This table identifies the latest run for each pipeline in the project.
</li>
</ul>
<h3>Silver</h3>
<ul><li><strong>Schedule Pipeline Expectations</strong>: This table enables us to set
expectations for the frequency of pipeline runs. You can simply edit
the source table, add the Pipeline ID (found in
"PipelinesInProject"), and set the threshold for the number of hours
that can pass before an issue is raised.
</li>
<li><strong>Pipeline Runs in Project</strong>: This table combines the results of the
bronze group and represents a curated overview of pipeline runs.
</li>
<li><strong>Issues - Latest Pipeline Run Failures</strong>: This table combines the
results from the bronze group and flags any pipelines where the
latest iteration failed.
</li>
<li><strong>Issues -- Schedule Pipelines That Failed to Run as Scheduled</strong>:
This table combines the results from the bronze group and the
"Schedule Pipeline Expectations" table, flagging any scheduled
pipelines that failed to meet expectations.
</li>
</ul>
<p>These tables are already formatted for integration with the
<a href="https://github.com/kerski/pbi-dataops-monitoring" target="_blank">pbi-dataops-monitoring
template.</a> Just add
the tables as <a href="https://github.com/kerski/pbi-dataops-monitoring/blob/main/documentation/Azure-DevOps-Addon.md">instructed here</a>, and you’re all
set.</p>
<p>Do you want to try the dataflow for your Azure DevOps project? I’ve got
a setup script that will install a template flow and create the Personal
Access Token so you can authenticate the connection to Azure DevOps in
your dataflow.</p>
<h2 id="side-note-on-personal-access-tokens">Side Note on Personal Access Tokens</h2>
<p>If you just read “personal access token” and thought, “What happens if
that expires? This could break, and I won’t be able to monitor my
dataflows,” you’re already thinking with DataOps principles in mind! As
the <a href="https://devblogs.microsoft.com/devops/introducing-service-principal-and-managed-identity-support-on-azure-devops/" target="_blank">service principal
authentication</a>
becomes generally available, we may see new and improved authentication
options. In the meantime, proactively schedule reminders for updating
tokens to avoid monitoring disruptions.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The increasing adoption of Azure Pipelines with <a href="https://powerbi.microsoft.com/en-us/blog/deep-dive-into-power-bi-desktop-developer-mode-preview/" target="_blank">Git Integration in
Power
BI</a>
is exciting, prioritize monitoring these pipelines. Try the Gen1
dataflow template <a href="https://github.com/kerski/pbi-dataops-monitoring/blob/main/documentation/Azure-DevOps-Addon.md" target="_blank">I have shared on GitHub</a> to get started.</p>
<p>I’d like to hear your thoughts, so please let me know what you think on <a href="https://www.linkedin.com/in/john-kerski-41a697100" target="_blank">LinkedIn</a> or
<a href="https://twitter.com/jkerski" target="_blank">Twitter</a> or <a href="https://www.youtube.com/channel/UC4xZ_vpQaVbrWzYpfvsjoGg/" target="_blank">YouTube</a>.</p>
<p><i>This article was edited by my colleague and senior technical writer, <a href="https://www.linkedin.com/in/kiley-williams-garrett-9a16a618a/" target="_blank">Kiley Williams Garrett</a>.</i></p>John KerskiDataOps Principle #14 - Analytics is manufacturingPart 30: Bringing DataOps to Power BI2023-07-05T00:00:00+00:002023-07-05T00:00:00+00:00https://www.kerski.tech//bringing-dataops-to-power-bi-part30<h2 id="dataops-101-on-youtube">DataOps 101 on YouTube</h2>
<p><br />
Over the past year during meetups and conferences, I have had the opportunity to introduce the concepts of DataOps to project managers, data engineers, and data analysts. During each session I share the trials and tribulations I have experienced with managing analytics products involving Power BI and how to avoid them by adhering to DataOps principles. Often after these sessions I have attendees commiserate with my stories and provide a few stories of their own. For me (and I hope for the attendees), it is cathartic to know that I’m not alone in the challenges of delivering analytic solutions.</p>
<p>Earlier this year I recorded a version of my DataOps 101 session, and I have posted that <a href="https://youtu.be/K4g7LdEJBSI" target="_blank">session on YouTube</a>. My hope is that you might learn more about DataOps, see how it can help you on your projects, and even laugh a little at a data engineer trying to do his best with video editing (I’m no Guy In A Cube)</p>
<p>After watching it, let me know what you think on <a href="https://www.linkedin.com/in/john-kerski-41a697100" target="_blank">LinkedIn</a> or
<a href="https://twitter.com/jkerski" target="_blank">Twitter</a> or <a href="https://www.youtube.com/channel/UC4xZ_vpQaVbrWzYpfvsjoGg/" target="_blank">YouTube</a>.</p>
<p>I’ll continue to use my blog as my main medium of communication, and based on everyone’s feedback I’ll plan future videos.</p>
<p><i>Note: This was recorded prior to <a href="https://powerbi.microsoft.com/en-us/blog/deep-dive-into-power-bi-desktop-developer-mode-preview/" target="_blank">Power BI Desktop Developer Mode</a> making it to preview, but I still stand by my advice on learning Git and looking at <a href="https://github.com/pbi-tools" target="_blank">pbi-tools</a></i>😊</p>John KerskiDataOps 101 on YouTube Over the past year during meetups and conferences, I have had the opportunity to introduce the concepts of DataOps to project managers, data engineers, and data analysts. During each session I share the trials and tribulations I have experienced with managing analytics products involving Power BI and how to avoid them by adhering to DataOps principles. Often after these sessions I have attendees commiserate with my stories and provide a few stories of their own. For me (and I hope for the attendees), it is cathartic to know that I’m not alone in the challenges of delivering analytic solutions.Part 29: Bringing DataOps to Power BI2023-06-13T00:00:00+00:002023-06-13T00:00:00+00:00https://www.kerski.tech//bringing-dataops-to-power-bi-part29<h2 id="quality-is-paramount-and-the-cell-level-error">Quality is Paramount and the Cell-Level Error</h2>
<p>Cell-Level errors (like the one in <strong>Figure 1</strong>) are things of
nightmares to me. You feel confident about the dataset when you move it
to production, and, for a while, everything goes according to
plan—until it doesn’t. You hear something is wrong, and upon
inspecting the data, you realize some of your custom columns have
generated cell-level errors.</p>
<p><img src="/assets/img/posts/part29/Figure1.png" alt="Figure 1" class="center-image" /></p>
<p><em class="center-text-figure">Figure 1 - Example of a cell-level error.</em></p>
<p>Microsoft states in <a href="https://learn.microsoft.com/en-us/power-query/dealing-with-errors" target="_blank">their
documentation</a>:
<em>“A cell-level error won't prevent the query from loading, but displays
error values as <strong>Error</strong> in the cell. Selecting the white space in the
cell displays the error pane underneath the data preview.”</em></p>
<p>To extend this definition, “A cell-level error won’t prevent the query
from loading,” but a blank value will appear in the dataset. As an
example, <strong>Figure 2</strong> illustrates a custom column that performs the
function <em>Number.FromText</em> using the value from the second column. When
the value of the second column is two, the <em>Number.FromText</em> function
fails because it expected a numeric value (i.e., 2).</p>
<p><img src="/assets/img/posts/part29/Figure2.png" alt="Figure 2" class="center-image" /></p>
<p><em class="center-text-figure">Figure 2 - A cell with a cell-level error appears blank in the model.</em></p>
<p>A common culprit for this issue is data drift, the unexpected and
undocumented changes to data structure and semantics. While you may not
be responsible for or have authority over the upstream data that feeds
into your dataset, who do your customers blame? <strong>You</strong>.</p>
<p>So how do we avoid the dreaded cell-level error? It starts with
embracing a DataOps principle:
<a href="https://kerski.azureedge.net/bringing-dataops-to-power-bi-part4/" target="_blank"><strong>Quality is Paramount.</strong></a> This principle
states:</p>
<p><em>Analytic pipelines should be built with a foundation capable
of <strong>automated detection</strong> of abnormalities and security issues in code,
configuration, and data, and should provide continuous feedback to
operators for error avoidance.</em></p>
<p>As shown in <strong>Figure 3</strong>, working with Power BI in an analytic pipeline
typically consists of 3 steps:</p>
<p>1) Extracting data from the source using Power Query</p>
<p>2) Transforming that data in Power Query</p>
<p>3) Loading that data into a Power BI dataset</p>
<p><img src="/assets/img/posts/part29/Figure3.png" alt="Figure 3" class="center-image" /></p>
<p><em class="center-text-figure">Figure 3 - Health checks should occur in a typical Power BI analytic
pipeline.</em></p>
<p>According to the Quality is Paramount principle, you should check for
issues throughout all three steps to reduce errors. However,
implementing what I call health checks into pipelines is easier said
than done. Power BI doesn’t offer an out-of-box health check tool yet
(I’m hopeful Microsoft Fabric/Data Activator will close that loop soon).
Until that happens, I typically recommend teams perform the following
health checks for each step:</p>
<ol style="line-height:200%">
<li><b>Extract.</b> Don't trust your upstream sources. You can check the
source data for schema or content issues through several methods:
<ol style="line-height:200%" type="a">
<li>Dataflows 1.0 -- Following the <a href="https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion" target="_blank">Medallion
approach</a>, you can initially ingest the source data in the bronze layer of
the dataflow. Then, via a silver dataflow, you can import the
data to conduct health checks. Finally, you would create the
gold flow to feed a health check dataset and this dataset could be used to
monitor for issues via a Power BI report.
</li>
<li>Dataflows 1.0 with Bring Your Own Data Storage -- If you have access to <a href="https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-azure-data-lake-storage-integration" target="_blank">Bring Your Own Data
Storage</a> with Dataflows 1.0, you can inspect the schema and content using
automated checks in Azure DevOps pipelines. I describe this
approach in <a href="https://kerski.azureedge.net/bringing-dataops-to-power-bi-part21/" target="_blank">Part
21</a>.
</li>
<li> Dataflows 2.0 -- Now this advice is for a product in preview at
the time of this writing, and there may be changes in the final
product. But as it stands, you can apply the Medallion approach
described above and follow up with Notebooks to inspect the data
and conduct health checks. I'll describe that approach in a
future article (yep, a teaser).
</li>
</ol>
</li>
<li><b>Transform.</b> Automate error detection in the code used to
transform data. To achieve this, teach your teams to perform the
following:
<ol style="line-height:200%" type="a">
<li>If you're allowed to quarantine data that doesn't meet
expectations and would produce cell-level errors, I recommended
reading <a href="https://radacad.com/exception-reporting-in-power-bi-catch-the-error-rows-in-power-query" target="_blank">Radacad's excellent
article</a> to create exception tables.
</li>
<li>If you cannot quarantine data, try implementing the
*try/otherwise* M code within all your transform columns or
adding custom column steps (see **Figure 4**). This step
automatically handles exceptions and transforms the data into a
more stable state.
<img src="/assets/img/posts/part29/Figure4.png" alt="Figure 4" class="center-image" />
<i class="center-text-figure">Figure 4 - Example of try/otherwise in M code.</i>
</li>
</ol>
</li>
<li><b>Load.</b> With any analytics pipeline, you should have an in-depth
defense approach. Even if you implement checks during the Extract
and Transform steps, mistakes can still happen. Here are ways to
double-check for cell-level errors:
<ol style="line-height:200%" type="a">
<li>ExecuteQuery -- With the custom connector I offered in <a href="https://kerski.azureedge.net/bringing-dataops-to-power-bi-part22/" target="_blank">Part
22</a>, you can build a health check into the <a href="https://github.com/kerski/pbi-dataops-monitoring" target="_blank">monitoring
template</a> and
check for the existence of null values produced from a
cell-level error. Using the ExecuteQuery function, you can
identify null values with a DAX query (shown in **Figure 5**
below) and raise those issues through a Power BI Report.
<img src="/assets/img/posts/part29/Figure5.png" alt="Figure 5" class="center-image" />
<span class="center-text-figure"><i>Figure 5. Example DAX query.</i></span>
</li>
<li>Check for "Errors in" tables -- If your team encounters cell-level
errors during development and chooses the "View Errors" option, a
table prefixed with "Errors in" is created and hidden in the model.
<img src="/assets/img/posts/part29/Figure6.png" alt="Figure 6" class="center-image" />
<i class="center-text-figure">Figure 6 - Example of an "Errors in" table.*</i>
If this table is present, it might mean that the cell-level error has been resolved and the table can be deleted. Or, it might mean that the cell-level errors, and the issues causing them, still exist. With the latest version of the <a href="https://github.com/kerski/pbi-dataops-monitoring" target="_blank">monitoring template</a>, you can identify occurrences of the "Errors in" tables.
</li>
</ol>
</li>
</ol>
<p>I hope this helps stop cell-level errors from reaching your production datasets. I’d like to hear your thoughts, so please let me know what you
think on <a href="https://www.linkedin.com/in/john-kerski-41a697100" target="_blank">LinkedIn</a> or <a href="https://twitter.com/jkerski" target="_blank">Twitter</a>.</p>
<p>Finally, if you’re going to be at the <a href="https://365educon.com/DC/index.php" target="_blank">365 EduCon in Washington D.C.</a>, June 12th-16th 2023, I’ll be presenting on Wednesday and Friday. I hope to see you there!</p>
<p><i>This article was edited by my colleague and senior technical writer, <a href="https://www.linkedin.com/in/kiley-williams-garrett-9a16a618a/" target="_blank">Kiley Williams Garrett</a>.</i></p>John KerskiQuality is Paramount and the Cell-Level Error