Extract Power BI metadata to manage datasource changes

Power BI metadata contains information on the underlying datasources referenced along with all relevant report design and instructions used to generate report contents.  This metadata also contains source data refresh rates, automated report distribution schedules and much more.  It is the glue that binds the entire Power BI reporting solution together.

Power BI metadata and data flow
Overview of Power BI system and features

For a Power BI developer or administrator, arguably considered the most important metadata is the inter-relationships between individual Power BI reports and the underlying datasources they are built on.  Whenever a change is planned in these datasource (file) structures or database models, this metadata answers the critical question “which of my Power BI reports may be affected by these (datasource) changes?

A SQL Server database contains this metadata in a table that was originally designed for use with SSRS, and extended to support Power BI.  The following SELECT statement can be used to extract this information from the Microsoft Reporting Services SQL database:

SELECT
 [Name]
 ,[Type]
 ,CASE Type
   WHEN 2 THEN 'SSRS Report'
   WHEN 5 THEN 'DataSource'
   WHEN 7 THEN 'Report Component'
   WHEN 8 THEN 'Shared Dataset'
   When 13 Then 'Power BI Report'
   ELSE 'Other - not applicable'
 END AS TypeDescription
 ,CONVERT(varbinary(max),Content) AS Content
 FROM <ReportServerdB>.dbo.Catalog
 WHERE Type in (2,5,7,8,13)

Connect to the SQL Server host and replace <ReportServerDB> with your Report Server database name reserved for your SSRS or Power BI reporting system.  The metadata itself is contained within the Catalog.Content field, and is encoded.   A great article is posted by Bret Stateham describing how to extract and convert the encoded catalog.content XML for SSRS reports, datasources and shared datasets by simply using a set of SQL statements.

However, this same method does not work for Power BI metadata stored in the same catalog.content field.  Reason being, the encoded Power BI metadata is actually the entire .pbit archive comprised of several compressed XML documents and folders.   Essentially it is an encoded binary file.

To extract this encoded data for Power BI content, you must use a programming language such as .net to detect and decode the Bit Order Marker and stream the ( essentially zip file ) contents into memory or write to disk for processing.  Below are C#.net code snippets to accomplish this:

SELECT 
c.[Path]
, c.[Name]
, convert(varbinary(max), c.content) as RDL
, c.Type 
FROM Catalog c 
WHERE c.Content IS NOT NULL 
AND c.Type IN (2,13,5,8,7)

We stream this information into a DataTable object.  Then for each row we write the content of the report out to a file with this C#.net code.

...
string extension = ( see explanation below. use c.type to derive )
string filename = System.IO.Path.GetTempPath() + System.Guid.NewGuid().ToString("N") + extension; 
...
 using (var stream = System.IO.File.Create(filename))
 {
byte[] content = (byte[])row["RDL"];
 stream.Write(content, 0, content.Length);
 }
...

This also works for SSRS types stored in the SQL database Report Server catalog as well (reports, report components, datasource and shared dataset definitions).  When writing the files, simply append the appropriate filename extension for each file based on the catalog.type value.   For example, for “SSRS reports” catalog.type = 2, append extension = “.rdl” to the filename.  For “Power BI reports” catalog.type = 13, append extension = “.pbix” to the filename, and for the remainder of types, use extension = “.xml”.

For the Power BI report, once written to disk as a decoded .pbix file, you can then open the archive file ( if renamed to .zip for example ) to easily extract the Power BI metadata file contents.   This will include:

  • /report/layoutjson Power BI report contents
  • datamashup – the raw data used by the report, still compressed and encoded
  • … plus a number of other files used by Power BI

The embeded layout file contains Power BI metadata in a json text file.  Note that some of the other .pbix file contents are themselves archive binaries and would need to be decoded.  For example the datamashup file is itself an archive with an encoding wrapper that contains some block header & footer information that would need to be stripped off in order to extract the archive file contents.

Members of our R&D team, Michael, Gordon, Matteo, Ibrahim and Madel, have successfully integrated Power BI search, documentation, schedule extract and change impact analysis support for on premise Power BI to our Dev Surge 365 Enterprise and Code Search Pro Server software editions, in our 19.2 or newer releases posted on/after Dec 6, 2019.  Find it EZ – Software change made simple.

About the author

Ken Gnazdowsky holds a Bachelor of (Computing) Science degree from Simon Fraser University. A recipient of the Governor General of Canada bronze medallion, he began his post-secondary education at Brandon University on an academic scholarship, Ken has worked in the software development industry for over 30 years. Ken is co-founder and President of Find it EZ Software Corp., creators of the Microsoft award-winning products, Code Search Pro and Dev Surge 365.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: