Automate moving measures between Power BI files .pbix, renaming and finding and replacing strings in DAX formula\expression

Yesterday I had to accomplish a task that got me thinking: this is going to take a hell of a lot of time to finish.

The user story is this: I have a Power BI .pbix file that is used somewhat as a staging\debugging file to try and test measures. Once the job is done I usually move things over to the final\main Power BI file which will be handed in to the client. Note that this final Power BI file already has report pages implemented. That's way I try things in a different file since the work can be done in a cleaner\faster model.
The problem this time was that I had to create\move lots of measures and so I'd have to do this measure by measure from the debugging file to the main file. It would take me about 1 hour to finish. Unfortunately Power BI has no built in copy\paste of measures between files as of today in its version 2.72.5556.801 64-bit (August 2019). At least we can copy and paste the visuals from one file to the other.

OK. Problem defined. Let's Google for a solution. The solution found was this awesome post titled Bulk measure handling (copy between reports/format changes). In this post the guy mentions the tool Tabular Editor. Well that was what I was looking for, really.

Just downloaded it directly from its GitHub repo here.

Followed the steps on that post and voilá: I could get more than 30 measures copied from one .pbix file to another. Great! This took me about 10 minutes to get done. 1\6 of the time previously estimated for that cumbersome\manual work. Check the following printscreen:

1st printscreen - measures moved from one Power BI file (on the left) to the other Power BI file (on the right)

2nd printscreen - batch renaming selected formulas in blue
There was another task that needed to be accomplished and it was a little bit different: needed to duplicate those copied measures. Tabular Editor makes it easy too. Just select all of them, right click and pick Duplicate # measures. Boom! You have all measures duplicated as seen in the printscreen on the right. Note that it adds a "copy" word to the measure name. So once the measures are duplicated we need to rename them and that's easy too. Select all duplicated measures, right click and pick Batch rename.

Just replaced the uppercase U in each formula name to a uppercase C as seen in the second printscreen above. Repeated the process and replaced " copy" with a blank string. That's it. Measures duplicated and renamed accordingly. Nonetheless there's still a missing step before theses measures are ready to go: change a table column name in the DAX formula\expression in each of these measures that got duplicated and renamed. How to get this done using Tabular Editor? Well, reading through its docs we can find this Advanced Scripting section. If you look at the first printscreen above you'll note that I wrote some C# code there that does exactly what is needed. See the code below:

Selected.Measures.ForEach(m => {
    var e = m.Expression;
   
    if(e.Contains("[Unit]"))
    {       
        e = e.Replace("[Unit]", "[Company]");
       
        // For debugging purposes
        //e.Output();
       
       
m.Expression = e;     
    }
});


What this piece of code is doing is that for each of the selected measures in blue on the left (see printscreens above), the measure DAX formula\expression is stored in a variable called e. I then check if this formula contains the word [Unit] that is the table column name that needs to be replaced. If the expression\formula contains this word it gets replaced with the word [Company] that refers to the other column name that is to be used in these duplicated formulas. After that the modified DAX formula is assigned back to the measure.

When finished with Tabular Editor, click Save. Return to Power BI and hit the Refresh button to reload the model.

This post is just a sample that showcases the power of Tabular Editor. You can manage and play with measures in any way imaginable.

I hope it helps someone that wants to accomplish trivial things in a more actionable way instead of having to copy\paste, rename and replace manually because that sucks.

Note: when trying to save back the modifications done in Tabular Editor an error was being displayed. Something along this line:

Failed to save modifications to the server. Error returned: 'Unexpected column name: Received column 'ObjectID.Expression' in rowset 'ObjectTranslations'. Expected column 'ObjectID.Set'.'.

I did the following to get rid of the nasty error message: opened Tabular Editor File menu and selected Preferences... checked "Allow unsupported Power BI features (experimental)". When we do this a new Translations folder appear in the Model tree. I just deleted the en-US that was under this folder (see printscreen below). Then tried saving again and it worked.

3rd printscreen - enabling Allow unsupported Power BI features (experimental) in Tabular Editor

Using IOptions from .NET Core in .NET Framework 4.5.1+ - Ninject dependency injection version

This post is based on this awesome article I found @ Medium:

Configuration Management in Legacy .NET Applications (.NET Framework 4.5.1+)

I wanted to use the cool IOptions interface to grab some configs from appsettings.json. The sad thing is that it's intended to be used with .NET Core but the article mentioned above tells us that we can circumvent that with some extension code.

I won't be able to use it anymore because I just found out that I can't in the current project setup I'm working on.

As I had already converted the AutoFac code to Ninject, I'm just sharing it here in case it's useful...

You'll need to test it.

Note: make sure you have installed the following NuGet packages:

- Microsoft.Extensions.Configuration
- Microsoft.Extensions.Options
- Microsoft.Extensions.Options.ConfigurationExtensions

Using Power BI parameter to specify Excel workbook path in Power Query

This is a quick post to share something that I did just now and that worked. I tried to search for some sample but couldn't find. Maybe my search query was not that right...

Let's say you use an Excel workbook as your data source in Power BI. Than you collaborate with a team and as we know, not everyone has the same folder structure setup. This can cause a problem when trying to Refresh data in Power BI. How can we "overcome" this in a simple way?

Just create a Power BI parameter and pass it to the Power Query responsible for opening the Excel workbook.

In the following printscreen you see that I created a parameter called Excel workbook path. I set this path to where the workbook is located in my machine.


Then we can use this parameter in Power Query like this:

= Excel.Workbook(File.Contents(#"Excel workbook path"&"Contingency Drawdown.xlsx"), null, true)

I was having some difficulty trying to make the string concatenation work, that is, joining the parameter with the workbook name. Tried some variants and got it working with the string concatenation token &.

As you see the workbook name doesn't change (it's hardcoded) however the path is customizable by the parameter "Excel workbook path".


We just need to change this parameter value to point to our folder structure and we're done. Now we can make use of this parameter to open all Excel sheets (tables in Power BI) in a single shot.

Hope it helps.

References:

Power BI Introduction: Working with Parameters in Power BI Desktop —Part 4

Impersonation in Windows Service with ASP.NET Core

In a project one of the requirements was to be able to impersonate the current logged on Windows user.

The ASP.NET Core 2.1 project was hosted in a Windows Service. More info about this here:
Host ASP.NET Core in a Windows Service

The problem was that when doing WindowsIdentity.GetCurrent().Name; what was being returned was the account name used as Log On as for the service. That makes sense since it is that account that is executing the service.

This particular project needed to get the current logged on Windows user and check their permissions system wide, that is, check which claims the current user making use of the service has. Claims are the key component by which the outcome of all Web-based authentication and authorization requests are determined. More info on that here: The Role of Claims.

This way we can assure a user belongs to a specific User Group, the user has read permission on some folder, etc.

After trying some options, nothing worked. Then I decided to search for “Windows Authentication in ASP.NET Core” using Google and it lead me to this page: Configure Windows Authentication in ASP.NET Core.
Since we’re not hosting the app in IIS, that was a no go. However there’s another option below that page and it’s called HTTP.sys. HTTP.sys server supports self-hosted scenarios on Windows and that was the way to go.


First thing we need to do is make use of it when creating the WebHost object:

public static IWebHostBuilder CreateWebHostBuilder(string[] args,
    IConfigurationRoot config) =>
        WebHost.CreateDefaultBuilder(args)
               .UseConfiguration(config)
               .UseStartup<Startup>()
               // Using HTTP.sys for Windows Authentication support
               .UseHttpSys(options =>
               {
                   options.Authentication.Schemes =
                   AuthenticationSchemes.NTLM | AuthenticationSchemes.Negotiate;
                   options.Authentication.AllowAnonymous = false;
               });

Then in Startup.cs inside the method ConfigureServices we do:

// This method gets called by the runtime.
// Use this method to add services to the container.
public void ConfigureServices(IServiceCollection services)
{
    services.AddLogging(builder =>
    {
        builder.AddFilter("Microsoft", LogLevel.Warning)
               .AddFilter("System", LogLevel.Warning)
               .AddConsole();
    });
    // HTTP.sys server
    // HttpSysDefaults requires the following import:
    // using Microsoft.AspNetCore.Server.HttpSys;
    services.AddAuthentication(HttpSysDefaults.AuthenticationScheme);
services.AddMvc();
}


That's it.

Now to impersonate the user we do the following:


[HttpGet]
public async Task<IActionResult> MethodWithImpersonatedWindowsUser(
[FromQueryAttribute] string filePath)
{
    // The user used as Log On as for the Windows Service
    var serviceUser = WindowsIdentity.GetCurrent().Name;
   
    // The user to be impersonated
    var userToImpersonate = (WindowsIdentity)HttpContext.User.Identity;
    // Impersonating the current Windows user [HttpContext.User.Identity]...
    var result = await WindowsIdentity.RunImpersonated(
                                     userToImpersonate.AccessToken, async () =>
    {
        // This time WindowsIdentity.GetCurrent() will retrieve the impersonated
        // user with its claims...
        var impersonatedUser = WindowsIdentity.GetCurrent().Name;
        // Your business logic code here...
    });
    return result;
}

I saw lots of people asking the same question I made prior to knowing how to handle this requirement: why are we seeing the service user returned instead of the current logged on user when doing WindowsIdentity.GetCurrent()? The answers were to the point: in Windows you can have many user sessions (Remote Desktop, etc) where all are logged on at the same time making use of the Windows Service. So, how to get the current logged on user while still making use of the service? That must come from the HttpContext, that is, from the request arriving from the browser. WindowsIdentity.RunImpersonated makes it all possible. It runs the specified lambda action as the impersonated Windows identity.

Using the power of Razor Views embedded in external class library with FluentEmail

Just got this working and would like to share how to get it configured since I could not find an explanation anywhere.

FluentEmail - All in one email sender for .NET and .NET Core is a great library to send e-mails.

Currently I have a full .NET 4.7.2 Console Application that makes use of FluentEmail.
My desire with it was to store the e-mail template (
Razor .cshtml view) in an external class library and be able to use this view\template in the Console Application.

FluentEmail GitHub page has a sample code that shows how to consume that:


var email = new Email(bob@hotmail.com)
    .Subject("Hey cool name!")
    .UsingTemplateFromEmbedded("Example.Project.Namespace.template-name.cshtml",
        new { Name = "Bob" },
        TypeFromYourEmbeddedAssembly.GetType().GetTypeInfo().Assembly);

Steps:

1 - Create a Razor Class Library project. More on that can be found here:
Create reusable UI using the Razor Class Library project in ASP.NET Core
It uses .NET Standard 2.0 as its Target framework.

2 - Add Microsoft.AspNetCore.Mvc.Core NuGet package to the class library.
This is to have Intellisense, etc in Visual Studio.

Install-Package Microsoft.AspNetCore.Mvc.Core -Version 2.1.3

3 - Add a Razor view called EmailTemplate.cshtml to your class library.

4 - Right click it in Solution Explorer and select Properties. In Build Action, select Embedded resource.

5 - Add the following code to your Razor View:


@model dynamic
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <title>Some Title</title>
    <base href="/" />
@*<link href="css/bootstrap/bootstrap.min.css" rel="stylesheet" />
      <link href="css/site.css" rel="stylesheet" />
      <script src="_framework/blazor.webassembly.js"></script>*@
</head>
<body>
    E-mail template
    Hey @Model.Name,<br>
    <font size="14">You are so @Model.Compliment.</font>
</body>
</html>
Note that I’m using dynamic in the view code. This is just for testing. You can have full blown Models and use them in the templates as well.

6 - In your Console Application add a reference to the Class Library created in step 1.

7 - In your FluentEmail code, do the following in UsingTemplateFromEmbedded method:

email.To(configuration["emailTo"])
     .SetFrom(configuration["emailFrom"])
     .Subject(configuration["emailSubject"])
     .UsingTemplateFromEmbedded(
      "YourCompany.YourRazorClassLibrary.EmailTemplate.cshtml",
    new { Name = "Leniel Maccaferri", Compliment = "Cool" },
    Assembly.Load("YourCompany.YourRazorClassLibrary"));


The first parameter is the full class library (DLL) name + dot + Razor view name with .cshtml extension. The second parameter is the anonymous type that will be passed as the Model to the view and the third parameter is the class library assembly itself.


You're good to go!

This gives you a lot of flexibility.

You can add images, customize the HTML, CSS, have
Layout pages, etc.


Hope it helps.

Mixing C# source code with PowerShell script code to speed your development

I had some C# code ready but needed to use it inside a PowerShell script.

At StackOverflow I found some code that put me in the right path. So here's a practical example when mixing codes inside a PowerShell script file:

. ".\Invoke-Parallel.ps1" # Importing another script into this PowerShell script file

$csharpSource = @"

using System;
using System.Collections.Generic;
using System.Linq;

public class DateExtensions
{
     public static List<Tuple<DateTime, DateTime>> GetWeeksBetweenDates(DateTime startDate, DateTime endDate)
     {
        var weeks = new List<Tuple<DateTime, DateTime>>();
        
    
         for (DateTime date = startDate; date <= endDate; date = date.AddDays(8))
         {
            var weekEnd = date.AddDays(7);
                            
            weeks.Add(new Tuple<DateTime, DateTime>(date, weekEnd <= endDate ? weekEnd : endDate));
         }

        return weeks;
    }
}
"@

Add-Type -TypeDefinition $csharpSource # Includes the C# code defined above to be called through the PowerShell script

$appPath = "C:\SomeFolder"
$startDate = [datetime]'6/1/2017'
$endDate = [datetime]'5/5/2018'

$weeks = [DateExtensions]::GetWeeksBetweenDates($startDate, $endDate)

#Calls MyApp in batches, that is, for each week...
$weeks | Invoke-Parallel -ImportVariables -ScriptBlock {   
    Set-Location -Path $appPath
           
    $date = [string]::Format("{0:yyyyMMdd},{1:yyyyMMdd}", $_.Item1, $_.Item2)

    #Write-Host $date

    dotnet MyApp.dll budat=$date cpudt=$date # Calls .NET Core "executable" DLL for each week
}

pause #Keeps PowerShell window open to see the results

As we see in the code above, the variable $csharpSource holds the C# source code that later will called in the PowerShell script.

We then add\mix the C# code to\with PowerShell with the command Add-Type passing to it the $csharpSource variable that holds the source code. Simple as that.

Inside PowerShell script code we call the method defined in C# code with:

$weeks = [DateExtensions]::GetWeeksBetweenDates($startDate, $endDate)

This is pretty useful because we don't need to convert our C# code to PowerShell idiom.

Hope it helps.

Reference:
How to convert C# code to a PowerShell Script?

Processing Stack Overflow data dump with Apache Spark

This post is about the final work I did for one of the disciplines of the Master's degree I'm currently attending at UFRJ - Federal University of Rio de Janeiro in the branch of Data and Knowledge Engineering (Databases) that is under the division of Computer and Systems Engineering department at COPPE\UFRJ.

The discipline is called Special Topics in Databases IV and is taught by professor Alexandre Bento de Assis Lima.

The presentation (PPT slides) is in Brazilian Portuguese. I'll translate the slides to English in this blog post. They give an overall view about the work done.

The final paper is written in English.

Files

Trabalho prático sobre Apache Spark envolvendo um problema típico de Big Data (apresentação\presentation).pdf (in Portuguese)

Processing Stack Overflow data dump with Apache Spark (in English)

Abstract. This paper describes the process involved in building an ETL tool based on Apache Spark. It imports XML data from Stack Overflow data dump.
The XML files are processed using Spark XML library and converted to a DataFrame object. The DataFrame data is then queried with Spark SQL library.
Two applications were developed: spark-backend and spark-frontend. The first one contains the code responsible for dealing with Spark while the later one is user centric allowing the users to consume the data processed by Spark.

All the code developed is in English and should be easy to read.

Presentation
  1. Objective
  2. Problem
  3. Technologies
  4. Strategy used to acquire the data
  5. Development
  6. Conclusion
  7. Links
  1. Objective
    • Put into practice the concepts presented during the classes.
    • Have a closer contact with modern technologies used to process Big Data.
    • Automate the Extraction\Mining of valuable\interesting information hidden in the immensity of data.

  2. Problem
    • Analyse StackOverflow data dump available on the internet on a monthly basis.
    • The data dump is composed of a set of XML files compacted with the .7z extension.
    • Even after compaction the biggest file has 15.3 GB. This size is directly linked to the data volume handled by Big Data.
    • Spark at first will be used as an ETL tool (ETL = Extract > Transform > Load) to prepare the data consumed by a front-end web app.
    • "At first" because there's also the possibility of using Spark as a tool to process the data that'll be shown in the web app.

  3. Technologies
    • Apache Spark 2.0.1 +
    • Spark XML 0.4.1 +
    • Spark SQL 2.0.2
    • Ubuntu 16.04 LTS (Xenial Xerus)
    • Linux VM (virtual machine) running on Parallels Desktop 12 for Mac
    • Scala 2.11.8
    • XML (Extensible Markup Language)
    • XSL (Extensible Stylesheet Language)
    • Play Framework 2.5 (front end)
    • Eclipse Neon 4.6.1 with Scala IDE 4.5.0 plugin as the IDE

  4. Strategy used to acquire the data
    • Got the .torrent file that contains all the data dumps from Stack Exchange family of sites - https://archive.org/details/stackexchange
    • Selected the eight .7z files related to StackOverflow: stackoverflow.com-Badges.7z, stackoverflow.com-Comments.7z, stackoverflow.com-PostHistory.7z, stackoverflow.com-PostLinks.7z, stackoverflow.com-Posts.7z, stackoverflow.com-Tags.7z, stackoverflow.com-Users.7z, stackoverflow.com-Votes.7z

  5. Development
    • To make the work viable (running locally out of a cluster), a single .xml file [Users.xml] was used. A subset of 100.000 lines (32.7 MB) was selected. This file has a total of 5,987.287 lines (1.8 GB).
    • hadoop@ubuntu:/media/psf/FreeAgent GoFlex Drive/Downloads$ head -100000 Users.xml > Users100000.xml
    • The file Users.xsl was used covert Users100000.xml data to the format expected by spark-xml library. The result was saved to Users100000.out.xml.


    • The .xml and .xsl files were placed into the input folder of the Scala project [spark-backend] inside Eclipse.
    • The application spark-backend read the file Users100000.out.xml through Spark XML and transforms it into a DataFrame object.
    • The Spark SQL library is used subsequently to search the data. Some sample queries were created.
    • Each query generates a CSV file (SaveDfToCsv) to be consumed in a later stage by a web application [spark-frontend], that is, Spark is used as an ETL tool.
    • The result of each query is saved in multiple files in the folder output. This happens because Spark was conceived to execute jobs in a cluster (multiple nodes\computers).
    • For testing purposes, a method that renames the CSV file was created. This method copies the generated CSV to a folder called csv. The destiny folder can be configured in the file conf/spark-backend.properties.



    • The application [spark-backend] can be executed inside Eclipse or through the command line in Terminal using the command spark-submit.
    • In the command line we make use of the JAR file produced during the project build in Eclipse. We pass as parameters the necessary packages as below:
    • spark-submit --packages com.databricks:spark-xml_2.11:0.4.1 -- class com.lenielmacaferi.spark.ProcessUsersXml - -master local com.lenielmacaferi.spark- backend-0.0.1-SNAPSHOT.jar
    • The application [spark-frontend] was built with Play Framework (The High Velocity Web Framework For Java and Scala).
    • The user opens spark-frontend main page at localhost:9000 and has access to the list of CSV files generated by [spark-backend] application.
    • When clicking a file name, the CSV file is sent to the user's computer. The user can then use any spreadsheet software to open and post-process\analyse\massage the data.


  6. Conclusion
    • With Spark's help we can develop interesting solutions as for example: a daily job that can download and upload data to a folder "input" processing the data along the way in many different ways.
    • Using custom made code we can work with the data in a cluster (fast processing) using a rich API full of methods and resources. In addition, we have at our disposal inumerous additional libraries\plugins developed by the developer community. Put together all the power of Scala and Java and their accompanying libraries.
    • The application demonstrated can be easily executed in a cluster. We only need to change some parameters in the object SparkConf.

  7. Links