Power BI metadata contains information on the underlying datasources referenced along with all relevant report design and instructions used to generate report contents. This metadata also contains source data refresh rates, automated report distribution schedules and much more. It is the glue that binds the entire Power BI reporting solution together.
Overview of Power BI system and features
For a Power BI developer or administrator, arguably considered the most important metadata is the inter-relationships between individual Power BI reports and the underlying datasources they are built on. Whenever a change is planned in these datasource (file) structures or database models, this metadata answers the critical question “which of my Power BI reports may be affected by these (datasource) changes?”
A SQL Server database contains this metadata in a table that was originally designed for use with SSRS, and extended to support Power BI. The following SELECT statement can be used to extract this information from the Microsoft Reporting Services SQL database:
WHEN 2 THEN 'SSRS Report'
WHEN 5 THEN 'DataSource'
WHEN 7 THEN 'Report Component'
WHEN 8 THEN 'Shared Dataset'
When 13 Then 'Power BI Report'
ELSE 'Other - not applicable'
END AS TypeDescription
,CONVERT(varbinary(max),Content) AS Content
WHERE Type in (2,5,7,8,13)
However, this same method does not work for Power BI metadata stored in the same catalog.content field. Reason being, the encoded Power BI metadata is actually the entire .pbit archive comprised of several compressed XML documents and folders. Essentially it is an encoded binary file.
To extract this encoded data for Power BI content, you must use a programming language such as .net to detect and decode the Bit Order Marker and stream the ( essentially zip file ) contents into memory or write to disk for processing. Below are C#.net code snippets to accomplish this:
, convert(varbinary(max), c.content) as RDL
FROM Catalog c
WHERE c.Content IS NOT NULL
AND c.Type IN (2,13,5,8,7)
We stream this information into a DataTable object. Then for each row we write the content of the report out to a file with this C#.net code.
string extension = ( see explanation below. use c.type to derive )
string filename = System.IO.Path.GetTempPath() + System.Guid.NewGuid().ToString("N") + extension;
using (var stream = System.IO.File.Create(filename))
byte content = (byte)row["RDL"];
stream.Write(content, 0, content.Length);
This also works for SSRS types stored in the SQL database Report Server catalog as well (reports, report components, datasource and shared dataset definitions). When writing the files, simply append the appropriate filename extension for each file based on the catalog.type value. For example, for “SSRS reports” catalog.type = 2, append extension = “.rdl” to the filename. For “Power BI reports” catalog.type = 13, append extension = “.pbix” to the filename, and for the remainder of types, use extension = “.xml”.
For the Power BI report, once written to disk as a decoded .pbix file, you can then open the archive file ( if renamed to .zip for example ) to easily extract the Power BI metadata file contents. This will include:
/report/layout – json Power BI report contents
datamashup – the raw data used by the report, still compressed and encoded
… plus a number of other files used by Power BI
The embeded layout file contains Power BI metadata in a json text file. Note that some of the other .pbix file contents are themselves archive binaries and would need to be decoded. For example the datamashup file is itself an archive with an encoding wrapper that contains some block header & footer information that would need to be stripped off in order to extract the archive file contents.
Members of our R&D team, Gordon, Matteo, Ibrahim and Madel, are adding integrated Power BI search, documentation and change impact analysis support to our Dev Surge 365 Enterprise edition software, with an anticipated availability in early 2018. Find it EZ, change made simple.™
Three inter-related layers or processes in a robust BI analytics solution can include:
a Symantic (meta-data) layer;
Extract, Transform, Load (ETL) jobs.
Let’s take a closer look at each of these and how small changes within any one of them can affect your critical business reports at the top of your BI stack.
(1) Semantic layer – an abstracted ( meta-data ) layer that contains simplified business views mapped to underlying relational database models and data warehouses. This layer is a pre-requisite for some WSYWIG, drag and drop design reporting tools.
A semantic layer is a business representation of corporate data that helps non-technical end users access data using common business terms. The semantic layer is configured by a person who has knowledge of both the data store and the reporting needs of the business. The person creating the semantic layer chooses to expose appropriate fields in the data store as “business fields,” and to hide any fields that are not relevant. Each business field is given a friendly, meaningful name, and the business fields are organized in a way that will make sense to business users.
By using common business terms, rather than data language, to access, manipulate, and organize information, a semantic layer simplifies the complexity of business data. This is claimed to be core business intelligence (BI) technology that frees users from IT while ensuring correct results when creating and analysing their own reports & dashboards. In reality, however, IT staff are typically needed to create and maintain the semantic layer ( business view definitions ) which maps tables to classes and columns to objects.
(2) OLAP cubes – the arrangement of data into Cubes overcomes a limitation of relational databases, which are not well suited for near instantaneous analysis and display of large amounts of data. OLAP data is typically stored in a star schema or snowflake schema within a relational data warehouse or in a special-purpose data management system.
An OLAP cube is a term that typically refers to multi-dimensional array of data. OLAP is an acronym for online analytical processing, which is a computer-based technique of analyzing data to look for insights. A cube can be considered a multi-dimensional generalization of a two- or three-dimensional spreadsheet. Each cell of the cube holds a number that represents some measure of the business, such as sales, profits, expenses, budget and forecast. A Slice represents two-dimensional view of an OLAP Cube that arranges data in a grid, similar to a spreadsheet; a Slice functions much like a report or a query in an RDBMS in that it returns data based on a request for what to see
Although many report-writing tools exist for relational databases, these are slow when the whole database must be summarized, and present great difficulties when users wish to re-orient reports or analyses according to different, multidimensional perspectives, aka, Slices. The use of Cubes facilitate this kind of fast end-user interaction with data. Some reporting tools like Crystal Reports include the ability to define simple two-dimentional OLAP cubes for generating graphs for example.
(3) ETL Software – import clean data from virtually any source into permanent, structured relational database tables.
ALSO CALLED: Data Extraction Software, Database Extraction Software
DEFINITION: In managing databases, ETL refers to three separate functions combined into a single programming tool: Extract, Transform, Load
First, the extract function reads data from a specified source database and extracts a desired subset of data.
Next, the transform function works with the acquired data – using rules or lookup tables, or creating combinations with other data – to convert it to the desired state.
Finally, the load function is used to write the resulting data (either all of the subset or just the changes) to a target database, which may or may not previously exist.
ETL can be used to acquire a temporary subset of data from virtually any file for reports or other purposes, or a more permanent data set may be acquired for other purposes such as: the population of a data mart or data warehouse; conversion from one database type to another; and the migration of data from one database or platform to another.
A change to the underlying source database, or at any one of these BI layers can affect the hundreds or in some organizations, thousands of reports that are dependent on the underlying data model. Something as simple as renaming a field or variable name can essentially break the chain, and stop reports from working or producing accurate results the business depends on.
Nowimagine a change impact analysis tool that can search your entire BI process flow to show you all affected references throughout your Business Intelligence stack.
Find it EZ – change made simple. Download your free trial today @ http://www.finditez.com
Inexperienced and/or over-eager technology workers often unintentionally increase the risk of software project failures due to three common mistakes:
Lack of transparency
One of the biggest challenge with each of these classic mistakes is they often go unnoticed, potentially even rewarded or encouraged by project leaders and project managers … until it is too late and the damage is done. But why and how does this happen so often in our industry?
Lack of transparency. This comes down to one thing. Communication. In over 25 years of experience, I have observed that this is often a skill that many software developers simply have to work on as it does not come naturally. Most software engineers are logical, introspective thinkers and very much focused on getting right to work problem solving. Generally, people of action and few words. The problem is, software developers often have a tendency to jump right in and start “doing” without sharing or vetting their ideas, or taking the time to research all viable options before starting to change code. Communication, and full transparency are critical for cost-effective design selection and productive development phases.
Over engineering. This is a personal pet peeve of mine because I have been burned so many times by this mistake, both as a developer and project manager! WAY TOO MUCH time can be spent on creating an elegant, clever, over-engineered solution, especially when a simple direct approach is available, can be a huge time waster … and is often very costly!
As problem solvers, we enjoy technical challenges and like to avoid having to revisit code by (over) engineering a solution trying to anticipate when end users will change their mind … um I mean requirements. But in doing so, we may fail to realize or consider the unnecessary cost and effort involved. Don’t fix what aint broke, yet!
Often, complex solutions that may be considered by the creator to be “brilliant” and “ultimately flexible” can be poorly documented and extremely difficult to understand or maintain. Worse still, after hours of attempting to make these over engineered solutions work, the effort can prove fruitless and abandoned as unfeasible. Use the KISS principle …
Shiny objects. In a word, distractions. In a software project, this typically manifests itself in scope creep. This, however, is scope creep that originates from within the development team … “hey, let’s do this too while we are in here”, or “wouldn’t this be cool?” These suggestions can often be encouraged and accepted by the Project Lead or Manager as a client “bonus” … as it comes from an enthusiatic developer team member who swears they can do it without affecting the delivery timeline. The PM may also accept these to get buy-in from the developer and to avoid discouraging this type of proactive behaviour for team building. However, these types of side-tracks, often combined with lack of transparency, can be a project killer. Stay on track, stay focused, don’t get tempted by shiny objects.
Change impact analysis tools by FinditEZ provide a full 360 degree view of your entire code stack. Produce more accurate estimates, identify the right resource skillsets required, quickly run “what if” change impact cost scenarios to confidently select the most cost effective option. Transparent, easily distributed change lists for clear and simple communication. Perfectly engineered solutions. Change made simple @ https://www.finditez.com
For a medium to large enterprise that has grown through mergers and acquisitions or simply through years of success, just knowing what custom software source code assets you have can actually be a daunting challenge. However, having this knowledge can lead to better decision making and tremendous cost savings.
Take this first example from a recent customer inquiry. “We know that we need to upgrade some of our middleware and underlying database version due to advances in technology. We need to migrate to for security compliance, lower support costs and to benefit from improved performance. But we don’t even know how many Crystal and SSRS reports we have! We estimate there is over 5000 reports and we would like to extract and move any embedded SQL statements within them into the new target database as either a view, function or procedure. Where do we start?”
Perhaps not surprisingly, this is a very common challenge for a large organization relying on a complex, multi-language and multi-tier software applications to run their business. The software itself is big part of what distinguishes them from their competitors. It is a mission critical proprietary asset that needs it’s inventory ( content ) to be documented, current and relevant. This in turn facilitates efficient and cost effective ongoing enhancements and upgrades to the underlying operating system or middleware as needed from time to time.
IT staff and skills come and go. When some of the senior staff move on, you often lose valuable knowledge and understanding of complex system inter-connections. You need a way to have new staff find those relationships with current and up-to-date documentation to maximize their productivity and minimize introduction of programming errors.
FinditEZ supports technology across the entire analytics stack.