Olap for a small company. Designing Data Cubes

As part of this work, the following issues will be considered:

  • What are OLAP cubes?
  • What are measures, dimensions, hierarchies?
  • What types of operations can be performed on OLAP cubes?
The concept of an OLAP cube

The main postulate of OLAP is multidimensionality in data presentation. In OLAP terminology, the concept of a cube, or hypercube, is used to describe a multidimensional discrete data space.

Cube is a multi-dimensional data structure from which a user-analyst can query information. Cubes are created from facts and dimensions.

Data- this is data about objects and events in the company that will be subject to analysis. Facts of the same type form measures. A measure is the type of value in a cube cell.

Measurements- these are the data elements by which the facts are analyzed. A collection of such elements forms a dimension attribute (for example, days of the week can form a time dimension attribute). In business analysis tasks for commercial enterprises, the dimensions often include categories such as “time”, “sales”, “products”, “customers”, “employees”, “geographic location”. Dimensions are most often hierarchical structures, representing logical categories by which the user can analyze actual data. Each hierarchy can have one or more levels. Thus, the hierarchy of the “geographic location” dimension may include the levels: “country - region - city”. In the time hierarchy, we can distinguish, for example, the following sequence of levels: A dimension can have several hierarchies (each hierarchy of one dimension must have the same key attribute of the dimension table).

A cube can contain actual data from one or more fact tables and most often contains multiple dimensions. Any given cube usually has a specific focus for analysis.

Figure 1 shows an example of a cube designed to analyze sales of petroleum products by a certain company by region. This cube has three dimensions (time, product and region) and one measure (sales volume expressed in monetary terms). Measure values ​​are stored in the corresponding cells of the cube. Each cell is uniquely identified by a set of members of each dimension, called a tuple. For example, the cell located in the lower left corner of the cube (contains the value $98399) is specified by the tuple [July 2005, Far East, Diesel]. Here the value of $98,399 shows the sales volume (in monetary terms) of diesel in the Far East for July 2005.

It is also worth noting that some cells do not contain any values: these cells are empty because the fact table does not contain data for them.

Rice. 1. Cube with information on sales of petroleum products in various regions

The ultimate goal of creating such cubes is to minimize the processing time of queries that extract the required information from the actual data. To accomplish this task, cubes typically contain precomputed totals called aggregations(aggregations). Those. the cube covers a data space larger than the actual one - there are logical, calculated points in it. Aggregation functions allow you to calculate the values ​​of points in logical space based on actual values. The simplest aggregation functions are SUM, MAX, MIN, COUNT. So, for example, using the MAX function, for the cube given in the example, you can identify when the peak in diesel sales occurred in the Far East, etc.

Another specific feature of multidimensional cubes is the difficulty of determining the origin. For example, how do you set point 0 for the Product or Regions dimension? The solution to this problem is to introduce a special attribute that combines all the elements of the dimension. This attribute (created automatically) contains only one element - All. For simple aggregation functions such as sum, the All element is equivalent to the sum of the values ​​of all elements in the actual space of a given dimension.

An important concept in a multidimensional data model is the subspace, or sub cube. A subcube is a part of the full space of a cube in the form of some multidimensional figure inside the cube. Since the multidimensional space of a cube is discrete and limited, the subcube is also discrete and limited.

Operations on OLAP cubes

The following operations can be performed on an OLAP cube:

  • slice;
  • rotation;
  • consolidation;
  • detailing.
Slice(Figure 2) is a special case of a subcube. This is a procedure for forming a subset of a multidimensional data array corresponding to a single value of one or more dimension elements not included in this subset. For example, to find out how sales of petroleum products progressed over time only in a certain region, namely in the Urals, you need to fix the “Products” dimension on the “Ural” element and extract the corresponding subset (subcube) from the cube.
  • Rice. 2. OLAP cube slice

    Rotation(Figure 3) - the operation of changing the location of measurements presented in a report or on the displayed page. For example, a rotation operation may involve rearranging the rows and columns of a table. Additionally, rotating a data cube moves out-of-tabular dimensions into place with dimensions present on the displayed page, and vice versa.

    OLAP is not a separate software product, not a programming language, or even a specific technology. If we try to cover OLAP in all its manifestations, then it is a set of concepts, principles and requirements that underlie software products that make it easier for analysts to access data. Let's find out For what analysts need something special facilitate access to data.

    The fact is that analysts are special consumers of corporate information. The analyst's task is to find patterns in large amounts of data. Therefore, the analyst will not pay attention to the separate fact that on Thursday the fourth a batch of black ink was sold to counterparty Chernov - he needs information about hundreds and thousands similar events. Single facts in the database may be of interest, for example, to an accountant or the head of the sales department, who is responsible for the transaction. For an analyst, one record is not enough - he, for example, may need all transactions of a given branch or representative office for a month or a year. At the same time, analyst discards unnecessary details like the buyer’s TIN, his exact address and telephone number, contract index and the like. At the same time, the data that an analyst requires for his work necessarily contains numerical values ​​- this is due to the very essence of his activity.

    So, the analyst needs a lot of data, this data is selective and also of the nature of " attribute set - number". The latter means that the analyst works with tables of the following type:

    Here " A country", "Product", "Year" are attributes or measurements, A " Volume of sales" - thereby the numerical value or measure. The analyst’s task, we repeat, is to identify strong relationships between attributes and numerical parameters. Looking at the table, you will notice that it can easily be converted into three dimensions: we will put countries on one of the axes, goods on the other, and years on the third. And the values ​​in this three-dimensional array will be the corresponding sales volumes.

    Three-dimensional representation of the table. The gray segment shows that there are no data for Argentina in 1988

    It is precisely this three-dimensional array that is called a cube in OLAP terms. In fact, from the point of view of strict mathematics, such an array will not always be a cube: a real cube must have the same number of elements in all dimensions, but OLAP cubes do not have such a limitation. However, despite these details, the term “OLAP cubes”, due to its brevity and figurativeness, has become generally accepted. An OLAP cube does not have to be three-dimensional. It can be both two- and multidimensional, depending on the problem being solved. Particularly seasoned analysts may need about 20 dimensions - and serious OLAP products are designed for exactly this amount. Simpler desktop applications support around 6 dimensions.

    Measurements OLAP cubes consist of so-called marks or members. For example, the Country dimension consists of the labels Argentina, Brazil, Venezuela, and so on.

    Not all elements of the cube must be filled in: if there is no information on sales of rubber products in Argentina in 1988, the value in the corresponding cell simply will not be determined. It is also not at all necessary that an OLAP application necessarily store data in a multidimensional structure - the main thing is that this data looks exactly like this to the user. By the way, it is precisely the special methods of compact storage of multidimensional data that “vacuum” (unfilled elements) in cubes do not lead to wasted memory.

    However, the cube itself is not suitable for analysis. If it is still possible to adequately imagine or depict a three-dimensional cube, then with a six- or nineteen-dimensional cube the situation is much worse. That's why before use ordinary ones are extracted from a multidimensional cube two-dimensional tables. This operation is called "cutting" the cube. This term, again, is figurative. The analyst, as it were, takes and “cuts” the dimensions of the cube according to the marks of interest to him. In this way, the analyst receives a two-dimensional slice of the cube and works with it. In much the same way, lumberjacks count the annual rings on a cut tree.

    Accordingly, as a rule, only two dimensions remain “uncut” - according to the number of dimensions in the table. It happens that only a dimension remains “uncut” - if the cube contains several types of numeric values, they can be plotted along one of the table dimensions.

    If you look even more closely at the table that we depicted first, you will notice that the data in it is most likely not primary, but obtained as a result summation on smaller elements. For example, a year is divided into quarters, quarters into months, months into weeks, weeks into days. A country is made up of regions, and regions are made up of populated areas. Finally, in the cities themselves, districts and specific retail outlets can be identified. Products can be combined into product groups and so on. In OLAP terms, such multi-level associations are quite logically called hierarchies. OLAP tools make it possible to move to the desired hierarchy level at any time. Moreover, as a rule, several types of hierarchies are supported for the same elements: for example, day-week-month or day-decade-quarter. Source data is taken from lower levels of hierarchies and then summed to obtain values ​​at higher levels. In order to speed up the transition process, the summed values ​​for different levels are stored in a cube. Thus, what looks like one cube from the user's side, roughly speaking, consists of many more primitive cubes.

    Hierarchy example

    This is one of the essential points that led to the emergence of OLAP - productivity and efficiency. Let's imagine what happens when an analyst needs to obtain information, but there are no OLAP tools in the enterprise. The analyst independently (which is unlikely) or with the help of a programmer makes the appropriate SQL query and receives the data of interest in the form of a report or exports it to a spreadsheet. A great many problems arise in this case. Firstly, the analyst is forced to do something other than his job (SQL programming) or wait for programmers to complete the task for him - all this has a negative impact on labor productivity, increasing storming, heart attack and stroke rates, and so on. Secondly, a single report or table, as a rule, does not save the giants of thought and the fathers of Russian analysis - and the whole procedure will have to be repeated again and again. Thirdly, as we have already found out, analysts do not ask about trifles - they need everything at once. This means (although technology is advancing by leaps and bounds) that the corporate relational DBMS server accessed by the analyst can think deeply and for a long time, blocking other transactions.

    The concept of OLAP appeared precisely to solve such problems. OLAP cubes are essentially meta reports. By cutting meta-reports (cubes, that is) along dimensions, the analyst actually receives the “ordinary” two-dimensional reports that interest him (these are not necessarily reports in the usual sense of the term - we are talking about data structures with the same functions). The advantages of cubes are obvious - data needs to be requested from a relational DBMS only once - when building a cube. Since analysts, as a rule, do not work with information that is supplemented and changed on the fly, the generated cube is relevant for quite a long time. Thanks to this, not only are interruptions in the operation of the relational DBMS server eliminated (there are no queries with thousands and millions of response lines), but the speed of access to data for the analyst himself also sharply increases. In addition, as already noted, performance is also improved by calculating subsums of hierarchies and other aggregated values ​​at the time the cube is built. That is, if initially our data contained information about daily revenue for a specific product in a single store, then when forming a cube, the OLAP application calculates the totals for different levels of hierarchies (weeks and months, cities and countries).

    Of course, you have to pay to increase productivity in this way. It is sometimes said that the data structure simply “explodes” - an OLAP cube can take up tens or even hundreds of times more space than the original data.

    Answer the questions:

      What's happened cube OLAP?

      What's happened tags specific measurement? Give examples.

      Can they measures in an OLAP cube, contain non-numeric values.

    Information systems of a serious enterprise, as a rule, contain applications designed for complex analysis of data, their dynamics, trends, etc. Accordingly, top management becomes the main consumers of the analysis results. Such analysis is ultimately intended to support decision making. And in order to make any management decision, it is necessary to have the necessary information, usually quantitative. To do this, it is necessary to collect this data from all information systems enterprises, bring them to a common format and then analyze them. For this purpose, Data Warehouses are created.

    What is a data warehouse?

    Usually - the place where all information of analytical value is collected. The requirements for such stores correspond to the classic definition of OLAP and will be explained below.

    Sometimes the Warehouse has another goal - the integration of all enterprise data, to maintain the integrity and relevance of information within all information systems. That. the repository accumulates not only analytical, but almost all information, and can provide it in the form of directories back to other systems.

    A typical data warehouse is typically different from a typical relational database. First, regular databases are designed to help users perform day-to-day work, while data warehouses are designed for decision making. For example, the sale of goods and the issuance of invoices are carried out using a database designed for transaction processing, and the analysis of sales dynamics over several years, which allows planning work with suppliers, is carried out using a data warehouse.

    Second, while traditional databases are subject to constant change as users work, the data warehouse is relatively stable: the data in it is usually updated according to a schedule (for example, weekly, daily, or hourly, depending on needs). Ideally, the enrichment process is simply adding new data over a period of time without changing previous information already in the store.

    And thirdly, regular databases are most often the source of data that ends up in the warehouse. In addition, the repository can be replenished from external sources, such as statistical reports.

    How is a storage facility built?

    ETL– basic concept: Three stages:
    • Extraction – extracting data from external sources in an understandable format;
    • Transformation – transformation of the structure of the source data into structures convenient for building an analytical system;
    Let's add one more stage - data cleaning ( Cleaning) – the process of filtering out irrelevant or correcting erroneous data based on statistical or expert methods. So as not to generate reports like “Sales for 20011” later.

    Let's return to the analysis.

    What is analysis and why is it needed?

    Analysis is the study of data for the purpose of making decisions. Analytical systems are called decision support systems ( DSS).

    Here it is worth pointing out the difference between working with DSS and a simple set of regulated and unregulated reports. Analysis in DSS is almost always interactive and iterative. Those. the analyst digs into the data, composing and adjusting analytical queries, and receives reports, the structure of which may be unknown in advance. We will return to this in more detail below when we discuss query language. MDX.

    OLAP

    Decision support systems usually have the means to provide the user with aggregate data for various samples from the original set in a form convenient for perception and analysis (tables, charts, etc.). The traditional approach to segmenting source data involves extracting from the source data one or more multidimensional data sets (often called a hypercube or metacube), the axes of which contain attributes, and the cells contain aggregated quantitative data. (Such data can also be stored in relational tables, but in this case we are talking about the logical organization of data, and not about the physical implementation of their storage.) Along each axis, attributes can be organized in the form of hierarchies, representing different levels of their detail. Thanks to this data model, users can formulate complex queries, generate reports, and obtain subsets of data.

    The technology for complex multidimensional data analysis is called OLAP (On-Line Analytical Processing). OLAP is a key component of traditional data warehousing. The concept of OLAP was described in 1993 by Edgar Codd, a renowned database researcher and author of the relational data model. In 1995, based on the requirements set out by Codd, the so-called FASMI test (Fast Analysis of Shared Multidimensional Information) was formulated, including the following requirements for applications for multidimensional analysis:

    • providing the user with analysis results in an acceptable time (usually no more than 5 s), even at the cost of a less detailed analysis;
    • the ability to perform any logical and statistical analysis specific to a given application and save it in a form accessible to the end user;
    • multi-user access to data with support for appropriate locking mechanisms and authorized access means;
    • multidimensional conceptual representation of data, including full support for hierarchies and multiple hierarchies (this is a key requirement of OLAP);
    • the ability to access any necessary information, regardless of its volume and storage location.
    It should be noted that OLAP functionality can be implemented different ways, starting with the simplest data analysis tools in office applications and ending with distributed analytical systems based on server products. Those. OLAP is not a technology, but ideology.

    Before we talk about the various OLAP implementations, let's take a closer look at what cubes are from a logical point of view.

    Multidimensional concepts

    To illustrate the principles of OLAP, we will use the Northwind database, which is included with Microsoft SQL Server and is a typical database that stores trading information for a wholesale food distribution company. Such data includes information about suppliers, clients, a list of supplied goods and their categories, data about orders and ordered goods, a list of company employees.

    Cube

    Let's take for example the Invoices1 table, which contains the company's orders. The fields in this table will be as follows:
    • Order date
    • A country
    • City
    • Customer name
    • Delivery company
    • Product Name
    • Quantity of goods
    • Order price
    What aggregate data can we get from this view? Typically these are answers to questions like:
    • What is the total value of orders placed by customers from a particular country?
    • What is the total value of orders placed by customers in a certain country and delivered by a certain company?
    • What is the total value of orders placed by customers in a particular country in a given year and delivered by a particular company?
    All this data can be obtained from this table using quite obvious SQL queries with grouping.

    The result of this query will always be a column of numbers and a list of attributes describing it (for example, country) - this is a one-dimensional data set or, in mathematical language, a vector.

    Let's imagine that we need to obtain information on the total cost of orders from all countries and their distribution among delivery companies - we will get a table (matrix) of numbers, where delivery companies will be listed in the column headings, countries in the row headings, and in the cells there will be amount of orders. This is a two-dimensional data array. This set of data is called a pivot table ( pivot table) or crosstab.

    If we want to get the same data, but also by year, then another change will appear, i.e. the data set will become three-dimensional (a conditional 3rd order tensor or a 3-dimensional “cube”).

    Obviously, the maximum number of dimensions is the number of all attributes (Date, Country, Customer, etc.) that describe our aggregated data (amount of orders, number of products, etc.).

    This is how we come to the concept of multidimensionality and its embodiment - multidimensional cube. We will call such a table “ fact table" Dimensions or Cube Axes ( dimensions) are attributes whose coordinates are expressed by the individual values ​​of these attributes present in the fact table. Those. for example, if information about orders was maintained in the system from 2003 to 2010, then this year axis will consist of 8 corresponding points. If orders come from three countries, then the country axis will contain 3 points, etc. Regardless of how many countries are included in the Country directory. Points on an axis are called its “members” ( Members).

    In this case, the aggregated data themselves will be called “measures” ( Measure). To avoid confusion with "dimensions", the latter are preferably called "axes". The set of measures forms another "Measures" axis ( Measures). It has as many members (points) as there are measures (aggregated columns) in the fact table.

    Members of dimensions or axes can be combined by one or more hierarchies ( hierarchy). Let us explain what hierarchy is with an example: cities from orders can be united into districts, districts into regions, regions of a country, countries into continents or other entities. Those. there is a hierarchical structure - continent- country-region-district-city– 5 levels ( Level). For a region, data is aggregated for all cities that are included in it. For a region across all districts that contain all cities, etc. Why do we need multiple hierarchies? For example, on the order date axis we may want to group points (i.e. days) into a hierarchy Year-Month-Day or by Year-Week-Day: in both cases there are three levels. Obviously, Week and Month group the days differently. There are also hierarchies, the number of levels in which is not deterministic and depends on the data. For example, folders on a computer disk.

    Data aggregation can occur using several standard functions: sum, minimum, maximum, average, count.

    MDX

    Let's move on to the query language in multidimensional data.
    The SQL language was originally designed not for programmers, but for analysts (and therefore has a syntax that resembles natural language). But over time it became more and more complicated and now few analysts know how to use it well, if at all. It has become a tool for programmers. The MDX query language, rumored to have been developed by our former compatriot Mosha (or Mosha) Posumansky in the wilds of Microsoft, was also initially intended to be aimed at analysts, but its concepts and syntax (which is vaguely reminiscent of SQL, and completely in vain, i.e. because it only confuses), even more complicated than SQL. However, its basics are still easy to understand.

    We will look at it in detail because it is the only language that has received standard status within the framework of the general XMLA protocol standard, and secondly because there is an open-source implementation of it in the form of the Mondrian project from the company Pentaho. Other OLAP analysis systems (for example, Oracle OLAP Option) usually use their own extensions of the SQL syntax, however, they also declare support for MDX.

    Working with analytical data sets only means reading them and does not mean writing them. That. MDX has no clauses for changing data, but only one selection clause - select.

    In OLAP you can make multidimensional cubes slices– i.e. when data is filtered along one or more axes, or projections– when the cube “collapses” along one or more axes, aggregating data. For example, our first example with the amount of orders from countries is a projection of the cube onto the Country axis. The MDX query for this case will look like this:

    Select ...Children on rows from
    What's what here?

    Selectkeyword and is included in the syntax solely for beauty.
    is the name of the axis. All proper names in MDX are written in square brackets.
    is the name of the hierarchy. In our case, this is the Country-City hierarchy
    – this is the name of the axis member at the first level of the hierarchy (i.e. country) All – this is a meta-member that unites all members of the axis. There is such a meta-term in each axis. For example, in the year axis there is “All years”, etc.
    Children is a member function. Each member has several functions available. Such as Parent. Level, Hierarchy, returning respectively the ancestor, the level in the hierarchy and the hierarchy itself to which the member belongs in this case. Children - Returns a set of child members of this member. Those. in our case – countries.
    on rows– Indicates how to arrange this data in the resulting table. In this case - in the header of the lines. Possible values ​​here: on columns, on pages, on paragraphs, etc. It is also possible to simply indicate by index, starting from 0.
    from– this is an indication of the cube from which the selection is made.

    What if we don't need all countries, but only a couple specific ones? To do this, we can explicitly specify in the request the countries that we need, rather than selecting everything using the Children function.

    Select ( ..., ... ) on rows from
    The curly braces in this case are the declaration of the set ( Set). A set is a list, an enumeration of members from one axis.

    Now let’s write a query for our second example – output in the context of a delivery person:

    Select ...Children on rows .Members on columns from
    Added here:
    – axis;
    .Members– an axis function that returns all terms on it. Hierarchy and level have the same function. Because There is only one hierarchy in this axis, then its indication can be omitted, because the level and hierarchy are also the same, then you can display all members in one list.

    I think it’s already obvious how we can continue this with our third example with detail by year. But let’s better not drill down by year, but filter – i.e. build a slice To do this, we will write the following query:

    Select ..Children on rows .Members on columns from where (.)
    Where is the filtration here?

    where- keyword
    is one member of the hierarchy . The full name, including all terms, would be: .. , but because Since the name of this member is unique within the axis, all intermediate clarifications of the name can be omitted.

    Why is the date term in parentheses? The parentheses are a tuple ( tuple). A tuple is one or more coordinates along various axes For example, to filter along two axes at once, in parentheses we list two terms from different measurements separated by commas. That is, the tuple defines a “slice” of the cube (or “filtering”, if such terminology is closer).

    The tuple is used for more than just filtering. Tuples can also be in row/column/page headers, etc.

    This is necessary, for example, in order to display the result of a three-dimensional query in a two-dimensional table.

    Select crossjoin(...Children, ..Children) on rows .Members on columns from where (.)
    Crossjoin is a function. It returns a set of tuples (yes, a set can contain tuples!) resulting from the Cartesian product of two sets. Those. the resulting set will contain all possible combinations of Countries and Years. The row headers will thus contain a pair of values: Country-Year.

    The question is, where is the indication of what numerical characteristics should be displayed? In this case, the default measure defined for this cube is used, i.e. Order price. If we want to derive another measure, then we remember that measures are members of a dimension Measures. And we act in exactly the same way as with the other axes. Those. filtering a query by one of the measures will display exactly this measure in the cells.

    Question: What is the difference between filtering in where and filtering by specifying axis members in on rows. Answer: practically nothing. Simply in where a slice is indicated for those axes that do not participate in the formation of headings. Those. same axis can not be present at the same time on rows, and in where.

    Computed Members

    For more complex queries, you can declare computed members. Members of both the attribute and measure axes. Those. You can declare, for example, a new measure that will display the contribution of each country to the total amount of orders:

    With member. as '.CurrentMember / ..', FORMAT_STRING='0.00%' select ...Children on rows from where .
    The calculation occurs in the context of a cell in which all its coordinate attributes are known. The corresponding coordinates (members) can be obtained by the CurrentMember function for each of the cube axes. Here we must understand that the expression .CurrentMember/..’ does not divide one term by another, but divides relevant aggregated data cube slices! Those. the slice for the current territory will be divided into a slice for all territories, i.e. the total value of all orders. FORMAT_STRING – sets the format for displaying values, i.e. %.

    Another example of a calculated member, but on the years axis:

    With member. as '. - .’
    Obviously, the report will not contain a unit, but the difference of the corresponding sections, i.e. the difference in the amount of orders in these two years.

    Display in ROLAP

    OLAP systems are one way or another based on some kind of data storage and organization system. When we talk about RDBMS, we talk about ROLAP (we’ll leave MOLAP and HOLAP for independent study). ROLAP – OLAP on a relational database, i.e. described in the form of ordinary two-dimensional tables. ROLAP systems convert MDX queries to SQL. The main computing problem for databases is fast aggregation. To aggregate faster, the data in the database is usually highly denormalized, i.e. are not stored very efficiently in terms of disk space taken up and database integrity monitoring. Plus they additionally contain auxiliary tables that store partially aggregated data. Therefore, for OLAP, a separate database schema is usually created, which only partially replicates the structure of the original transactional databases in terms of directories.

    Navigation

    Many OLAP systems offer interactive navigation tools for an already generated query (and accordingly selected data). In this case, the so-called “drilling” or “drilling” is used. A more adequate translation into Russian would be the word “deepening.” But this is a matter of taste, in some environments the word “drilling” has stuck.

    Drill– this is report detailing by reducing the degree of data aggregation, combined with filtering along some other axis (or several axes). There are several types of drilling:

    • drill-down– filtering along one of the source axes of the report with the display of detailed information on descendants within the hierarchy of the selected filtering member. For example, if there is a report on the distribution of orders broken down by Countries and Years, then clicking on the year 2007 will display a report broken down by the same Countries and months of 2007.
    • drill-side– filtering under one or more selected axes and removing aggregation along one or more other axes. For example, if there is a report on the distribution of orders broken down by Countries and Years, then clicking on the year 2007 will display another report broken down, for example, by Countries and Suppliers with filtering by 2007.
    • drill-trough– removing aggregation along all axes and simultaneous filtering along them – allows you to see the source data from the fact table from which the value in the report was obtained. Those. When you click on a cell value, a report is displayed with all orders that gave this amount. A kind of instant drilling into the very “depths” of the cube.
    That's all. Now, if you decide to devote yourself to Business Intelligence and OLAP, it’s time to start reading serious literature.

    Tags: Add tags

    Home Terms Articles Courses Company Experience Blog Tips Downloads Partners Contacts Promotions

    Articles > Automation of budgeting and management accounting >

    Alexander Karpov, project manager bud-tech.ru, author of the book series “100% Practical Budgeting” and the book “Organization and Automation of Management Accounting”

    www.bud-tech.ru

    Perhaps for some, the use of OLAP technology (On-line Analytic Processing) when creating reports will seem somewhat exotic, so the use of OLAP-CUBE for them is not at all one of the most important requirements when automating budgeting and management accounting.

    In fact, it is very convenient to use a multidimensional CUBE when working with management reporting. When developing budget formats, you may encounter the problem of multivariate forms (you can read more about this in Book 8 “Technology for setting up budgeting in a company” and in the book “Setup and automation of management accounting”).

    This is due to the fact that effective management of a company requires increasingly detailed management reporting. That is, the system uses more and more different analytical sections (in information systems, analytics are determined by a set of reference books).

    Naturally, this leads to the fact that managers want to receive reporting in all analytical sections that interest them. This means that the reports need to be made to “breathe” somehow. In other words, we can say that in this case we are talking about the fact that the meaning of the same report should provide information in different analytical aspects. Therefore, static reports no longer suit many modern managers. They need the dynamics that a multidimensional CUBE can provide.

    Thus, OLAP technology has already become a mandatory element in modern and future information systems. Therefore, when choosing a software product, you need to pay attention to whether it uses OLAP technology.

    Moreover, you need to be able to distinguish real CUBES from imitation ones. One such simulation is pivot tables in MS Excel. Yes, this tool looks like a CUBE, but in fact it is not one, since these are static, not dynamic tables. In addition, they have a much worse implementation of the ability to build reports using elements from hierarchical directories.

    To confirm the relevance of using the CUBE when constructing management reporting You can give a simple example with a sales budget. In the example under consideration, the following analytical sections are relevant for the company: products, branches and sales channels. If these three analytics are important for the company, then the sales budget (or report) can be displayed in several versions.

    It should be noted that if you create budget lines based on three analytical sections (as in the example under consideration), this allows you to create quite complex budget models and create detailed reports using CUBE.

    For example, a sales budget can be compiled using only one analytics (directory). An example of a sales budget built on the basis of one analytics “Products” is presented at Figure 1.

    Rice. 1. An example of a sales budget built on the basis of one analytics “Products” in the OLAP-CUBE of the INTEGRAL software package

    The same sales budget can be compiled using two analytics (directories). An example of a sales budget built on the basis of two analytics “Products” and “Branches” is presented at Figure 2.

    Rice. 2. An example of a sales budget built on the basis of two analytics “Products” and “Branches” in the OLAP-CUBE of the INTEGRAL software package

    .

    If there is a need to build more detailed reports, then the same sales budget can be compiled using three analytics (directories). An example of a sales budget built on the basis of three analytics “Products”, “Branches” and “Sales Channels” is presented at Figure 3.

    Rice. 3. An example of a sales budget built on the basis of three analytics “Products”, “Branches” and “Sales Channels” in the OLAP-CUBE of the INTEGRAL software package

    It should be recalled that the CUBE used to generate reports allows you to display data in different sequences. On Figure 3 The sales budget is first “expanded” by product, then by branch, and then by sales channel.

    The same data can be presented in a different sequence. On Figure 4 the same sales budget is “unrolled” first by product, then by sales channel, and then by branch.

    Rice. 4. An example of a sales budget built on the basis of three analytics “Products”, “Distribution Channels” and “Branches” in the OLAP-CUBE of the INTEGRAL software package

    On Figure 5 the same sales budget is “unrolled” first by branches, then by products, and then by sales channels.

    Rice. 5. An example of a sales budget built on the basis of three analytics “Branches”, “Products” and “Sales Channels” in the OLAP-CUBE software package “INTEGRAL”

    In fact, these are not all possible options for withdrawing the sales budget.

    In addition, you need to pay attention to the fact that KUB allows you to work with the hierarchical structure of directories. In the examples presented, the hierarchical directories are “Products” and “Sales Channels”.

    From the user's point of view, he is in this example receives several management reports (see Rice. 1-5), and from the point of view of settings in the software product, this is one report. Simply using the CUBE you can view it in several ways.

    Naturally, in practice, a very large number of options for outputting various management reports is possible if their articles are based on one or more analysts. And the set of analytics itself depends on the users’ needs for detail. True, we should not forget that, on the one hand, the larger the analyst, the more detailed reports can be built. But, on the other hand, this means that the financial budgeting model will be more complex. In any case, if there is a KUB, the company will have the opportunity to view the necessary reporting in various versions, in accordance with the analytical sections of interest.

    It is necessary to mention several more features of the OLAP-CUBE.

    In a multidimensional hierarchical OLAP-CUBE there are several dimensions: row type, date, rows, directory 1, directory 2 and directory 3 (see. Rice. 6). Naturally, the report displays as many buttons with directories as there are in the budget line containing the maximum number of directories. If there is not a single reference book in any budget line, then the report will not have a single button with reference books.

    Rice. 6. OLAP-CUBE measurements of the INTEGRAL software package

    Initially, the OLAP-CUBE is built along all dimensions. By default, when the report is initially built, the dimensions are located in exactly the areas shown in Figure 6. That is, a dimension such as “Date” is located in the area of ​​vertical dimensions (dimensions in the column area), dimensions “Rows”, “Directory 1”, “Directory 2” and “Directory 3” - in the area of ​​horizontal dimensions (dimensions in the area rows), and the “Row Type” dimension is in the area of ​​“unexpanded” dimensions (dimensions in the page area). If a dimension is in the last area, then the data in the report will not "expand" on that dimension.

    Each of these dimensions can be placed in any of the three areas. Once measurements are transferred, the report is instantly rebuilt to match the new measurement configuration. For example, you can swap the date and lines with reference books. Or you can move one of the reference books to the vertical measurement area (see. Rice. 7). In other words, you can “twist” the report in the OLAP-CUBE and select the report output option that is most convenient for the user.

    Rice. 7. An example of rebuilding a report after changing the measurement configuration of the INTEGRAL software package

    The measurement configuration can be changed either in the main CUBE form or in the change map editor (see. Rice. 8). In this editor, you can also drag and drop measurements from one area to another with the mouse. In addition, you can swap measurements in one area.

    In addition, in the same form you can configure some measurement parameters. For each dimension, you can customize the location of totals, the sorting order of elements, and the names of elements (see. Rice. 8). You can also specify which element name to display in the report: abbreviated (Name) or full (FullName).

    Rice. 8. Measurement map editor of the INTEGRAL software package

    You can edit measurement parameters directly in each of them (see. Rice. 9). To do this, click on the icon located on the button next to the measurement name.

    Rice. 9. An example of editing a directory 1 Products and services in the INTEGRAL software package

    Using this editor, you can select the elements that you want to show in the report. By default, all elements are displayed in the report, but if necessary, some elements or folders can be omitted. For example, if you need to display only one product group in the report, then you need to uncheck all the others in the measurement editor. After that, the report will contain only one product group (see. Rice. 10).

    You can also sort elements in this editor. In addition, elements can be rearranged in various ways. After such a regrouping, the report is instantly rebuilt.

    Rice. 10. Example of output in a report of only one product group (folder) in the INTEGRAL software package

    In the dimension editor, you can quickly create your own groups, drag and drop elements from directories there, etc. By default, only the Other group is automatically created, but other groups can be created. Thus, using the dimension editor, you can configure which elements of the reference books and in what order should be displayed in the report.

    It should be noted that all such rearrangements are not recorded. That is, after closing the report or after its recalculation, all directories will be displayed in the report in accordance with the configured methodology.

    In fact, all such changes could have been made initially when setting up the lines.

    For example, using restrictions you can also specify which elements or groups of directories should be displayed in the report and which should not.

    Note: the topic of this article is discussed in more detail at workshops "Budget management of an enterprise" And “Organization and automation of management accounting” conducted by the author of this article, Alexander Karpov.

    If the user almost regularly needs to display only certain elements or directory folders in the report, then it is better to make such settings in advance when creating report lines. If various combinations of directory elements in reports are important to the user, then there is no need to set any restrictions when setting up the methodology. All such restrictions can be quickly configured using the measurement editor.

    General information

    Microsoft Excel allows you to create PivotTable reports based on online analytical processing (OLAP) source data. When you work with PivotTable reports that are based on OLAP source data and reports that are based on non-OLAP source data, you may notice differences in the functionality and performance of the tool. This article discusses some of the main differences between PivotTable reports that are based on OLAP source data and PivotTable reports that are based on non-OLAP source data.

    Receive data and update differences

    OLAP databases are organized to facilitate the retrieval and analysis of large volumes of data. Before Excel displays summarized data in a PivotTable, the OLAP server performs calculations to summarize the data. Only the required summary data is returned to Excel as needed.

    With external non-OLAP databases, all individual records are returned and Excel does the summarization. Consequently, OLAP databases give Excel the ability to analyze significantly larger volumes of external data.

    The OLAP server pushes new data to Excel whenever the layout of a PivotTable or PivotChart report or view changes. When you use non-OLAP source data, the data is refreshed differently and different refresh options are available in the PivotTable Options dialog box.

    Non-OLAP data can be returned to Microsoft Excel as an external data range or a PivotTable report or PivotChart. OLAP data can only be returned to Excel as a PivotTable report or a PivotChart.

    Background request

    You cannot enable the background query option in the PivotTable Options dialog box when the PivotTable report is based on an OLAP data source.

    Queries with parameters

    PivotTable reports based on an OLAP data source do not support the use of parameter queries.

    Memory optimization

    The Optimize Memory check box in the PivotTable Options dialog box is not available when the PivotTable report is based on an OLAP data source.

    Page Field Options

    In PivotTable reports that are based on non-OLAP source data, you can use page field parameters to retrieve data for each element individually or for all elements simultaneously. These page field options are not available in reports that are based on OLAP source data. Source OLAP data is always retrieved for each element as needed, allowing reports to display information from large OLAP databases.

    Differences in calculation

    Page Field Options

    You cannot change the function to summarize data fields in a PivotTable report that is based on OLAP source data. This limitation occurs because totals are calculated on the OLAP server. Summary functions

    Cannot create a calculated field or calculated item in a PivotTable based on an OLAP data source.

    Calculated fields and calculated items

    When working with subtotals in a PivotTable report that is based on OLAP source data, the following restrictions apply.

    You cannot change the total function for subtotals in a PivotTable report.

    OLAP-CUBE (dynamic management reporting)

    Cannot display subtotals for internal or internal column fields in a PivotTable report.

    Because totals are calculated on the OLAP server, you cannot change the Intermediate Hidden Page Items in the PivotTable Options dialog box.

    Subtotals

    The Total * option in the PivotTable Options dialog box can only be used in PivotTable reports that are based on OLAP source data. This option marks all subtotals and grand totals with an asterisk (*) to indicate that these values ​​contain hidden as well as displayed items.

    Layout and Design Differences

    Dimensions and measures

    When working with a PivotTable report that is based on OLAP source data, the analyst can only be used as row, column, or page fields. Measures can only be used as data fields. When you drag a dimension into a field data area, or a dimension into a row, column, or page margin area, you receive the following error message:

    The field you want to move cannot be placed in this area of ​​the PivotTable.

    When a PivotTable report based on OLAP source data is active, the PivotTable toolbar displays an icon next to each field row. The icon shows where Excel will allow you to place the field in your PivotTable report. If the icon is in the upper left corner, the field is a dimension that you can drag to a row, column, or field on the areas page. If the icon is in the lower right corner, the field is a measure that you can drag into the data fields area.

    Dimensions and measures

    Microsoft Excel allows you to rename fields that you add to a PivotTable. When a PivotTable report is based on OLAP source data, your user name will be lost when you delete a field from the PivotTable.

    Grouping and ungrouping elements

    In Excel 2000, you cannot group items in a PivotTable report that is based on OLAP source data;

    Renaming fields

    PivotTable reports based on OLAP source data allow you to display the lowest level of data available on the OLAP server.

    Grouping and ungrouping elements

    For non-OLAP source data, items in the new PivotTable report first appear sorted in ascending order by item name.

    Details

    The Show Pages command is not available in PivotTable reports that are based on OLAP source data.

    Show Items With No Data

    The Show Items Without Data option in the PivotTable Fields dialog box is not available in PivotTable reports that are based on OLAP source data.

    Below is a list of questions on the subject Information technology in management of MFPU/MFPA “Synergy”

    ... is an interactive automated system that helps with ...

    OLAP in the narrow sense of the word is interpreted as...

    OLAP systems (online analytical processing) are...

    OLTP systems turned out to be of little use because...

    Automated control system (automated information…

    In MS Project...

    In an OLTP system, data updates occur...

    A diagram designed to analyze a work plan using methods...

    An information system is a set of interconnected elements...

    Information technology is...

    Information support is...

    Information technologies influence the development of society in the following ways...

    Information exchange in the structure of the organization’s management bodies…

    Executive Information Systems…

    The characteristics of “small” information systems include...

    The characteristics of “medium” scale information systems include...

    Information processing methods are...

    The modular principle of constructing accounting information systems...

    The figure shows a fragment of a diagram of type..., made in pro...

    On a network diagram in MS Project, a task from an external project...

    On a network diagram in MS Project, a task not related to...

    On the network diagram in the MS Project program, the task assigned...

    On a network diagram in MS Project, a summary task, combined

    On the composition and number of automated workstations included...

    The science of information activities, information processes and...

    Organization of an information system in which on a remote server...

    The main purpose of an OLAP system is...

    The main purpose of ERP systems is to automate...

    The main purpose of the MPS methodology is...

    The main characteristics of OLAP systems are...

    The technical support subsystem includes...

    The sequence of technological stages for modification of primary...

    When networking personal computers in the form of intrapro...

    Applied software The computer is intended for...

    An example of subject information technology is technology...

    The decision support process involves...

    An Enterprise-Scale Network or Corporate Network is an information...

    The artificial intelligence system is…

    Transaction processing systems are systems designed to...

    Transaction processing systems comply with...

    Decision Support Systems – DS…

    Modern methods and tools for analyzing and planning processes…

    Creation of integrated automated information systems…

    The created information systems become unsuitable for use...

    The specificity of the management support information system is manifested...

    Using traditional OLTP systems you can...

    The structure of corporate information systems is...

    To speed up and simplify the work of HR managers at the company...

    To speed up and simplify the work of HR managers at the company...

    The recorded perceived facts of the surrounding world represent...

    A chain of actions that most accurately reflects the management process...

    Economic problems solved through dialogue characterize...

    Expert systems are designed to process...

    Is a security breach or is related to security...

    OLAP made easy

    Amazing is nearby...

    In the course of my work, I often needed to make complex reports, I was always trying to find something common in them in order to compile them more simply and universally, I even wrote and published an article on this subject, “Osipov’s Tree.” However, my article was criticized and they said that all the problems that I raised had long been solved in OLAP (www.molap.rgtu.ru) and they recommended looking at pivot tables in EXCEL.
    It turned out to be so simple that, having applied my ingenious little hands to it, I came up with a very simple scheme for downloading data from 1C7 or any other database (hereinafter, 1C means any database) and analysis in OLAP.
    I think many OLAP upload schemes are too complicated, I choose simplicity.

    Characteristics :

    1. Only EXCEL 2000 is required for work.
    2. The user can design reports himself without programming.
    3. Uploading from 1C7 in a simple text file format.
    4. For accounting entries There is already a universal processing for unloading that works in any configuration. Sample processing is available for downloading other data.
    5. You can design report forms in advance and then apply them to different data without re-designing them.
    6. Quite good performance. At the first long stage, the data is first imported into EXCEL from a text file and an OLAP cube is built, and then pretty quickly any report can be built on the basis of this cube. For example, data on product sales for a store for 3 months with an assortment of 6000 products is loaded into EXCEL in 8 minutes on Cel600-128M, the rating by product and group (OLAP report) is recalculated in 1 minute.
    7. Data is downloaded from 1C7 in full for the specified period (all movements, across all warehouses, companies, accounts). When importing into EXCEL, it is possible to use filters that load only the necessary data for analysis (for example, from all movements, only sales).
    8. Currently, methods have been developed for analyzing movements or residues, but not movements and residues together, although this is possible in principle.

    What is OLAP : (www.molap.rgtu.ru)

    Let's say you have a retail chain. Let the data on trading operations be uploaded to a text file or table like this:

    Date - date of operation
    Month - month of operation
    Week - week of operation
    Type - purchase, sale, return, write-off
    Counterparty - an external organization participating in a transaction
    Author - the person who issued the invoice

    In 1C, for example, one row of this table will correspond to one line of the invoice; some fields (Counterparty, Date) are taken from the invoice header.

    Data for analysis is usually uploaded into an OLAP system for a certain period of time, from which, in principle, another period can be selected using loading filters.

    This table is the source for OLAP analysis.

    The user himself determines which of the table fields will be Dimensions, which Data and which Filters to apply. The system itself builds a report in a visual tabular form. Dimensions can be placed in the row or column headings of a report table.
    As you can see, from one simple table you can get a lot of data in the form of various reports.


    How to use it yourself :

    Unpack the data from the distribution exactly into the c:\fixin directory (for a trading system it is possible in c:\reports). Read the readme.txt and follow all the instructions in it.

    First you must write a processing that uploads data from 1C to a text file (table). You need to determine the composition of the fields that will be unloaded.
    For example, ready-made universal processing, which works in any configuration and downloads transactions for a period for OLAP analysis, downloads the following fields for analysis:

    Date|Day of the Week|Week|Year|Quarter|Month|Document|Company|Debit|DtNomenclature
    |DtGroupNomenclature|DtSectionNomenclature|Credit|Amount|ValAmount|Quantity
    |Currency|DtCounterparties|DtGroupCounterparties|KtCounterparties|KtGroupCounterparties|
    CTMiscellaneousObjects

    Where under the prefixes Dt(Kt) there are subaccounts of Debit (Credit), Group is the group of this subaccount (if any), Section is the group of the group, Class is the group of the section.

    For a trading system, the fields can be as follows:

    Direction|Type of Movement|For Cash|Product|Quantity|Price|Amount|Date|Company
    |Warehouse|Currency|Document|Day of the Week|Week|Year|Quarter|Month|Author
    |Product Category|Movement Category|Counterparty Category|Product Group
    |ValAmount|Cost|Counterparty

    To analyze the data, the tables "Movement Analysis.xls" ("Accounting Analysis.xls") are used. When opening them, do not disable macros, otherwise you will not be able to update the reports (they are run by VBA macros). These files take their source data from the files C:\fixin\motions.txt (C:\fixin\buh.txt), otherwise they are the same.

    OLAP Basics

    Therefore, you may have to copy your data to one of these files.
    To load your data into EXCEL, select or write your filter and click the “Generate” button on the “Conditions” sheet.
    Report sheets begin with the prefix "Report". Go to the report sheet, click "Update" and the report data will change in accordance with the last loaded data.
    If you are not satisfied with standard reports, there is a ReportTemplate sheet. Copy it to a new sheet and customize the report type by working with a pivot table on this sheet (for more information on working with pivot tables, see any EXEL 2000 book). I recommend setting up reports on a small set of data, and then running them on a large array, because... There is no way to disable redrawing of tables every time the report layout changes.

    Technical Notes :

    When uploading data from 1C, the user selects the folder where to upload the file. I did this because it is likely that several files (remains and movements) will be uploaded in the near future. Then, by clicking the “Send” -> “To OLAP analysis in EXCEL 2000” button in Explorer, the data is copied from the selected folder to the C:\fixin folder. (for this command to appear in the list of the “Send” command, you need to copy the file “For OLAP analysis in EXCEL 2000.bat” to the C:\Windows\SendTo directory) Therefore, upload the data immediately by naming the files motions.txt or buh.txt.

    Text file format:
    The first line of the text file is the column headers separated by "|", the remaining lines contain the values ​​of these columns separated by "|".

    To import text files into Excel, Microsoft Query (a component of EXCEL) is used; for it to work, you must have a shema.ini file containing the following information in the import directory (C:\fixin):


    ColNameHeader=True
    Format=Delimited(|)
    MaxScanRows=3
    CharacterSet=ANSI
    ColNameHeader=True
    Format=Delimited(|)
    MaxScanRows=3
    CharacterSet=ANSI

    Explanation: motions.txt and buh.txt are the name of the section, corresponds to the name of the imported file, describes how to import a text file into Excel. The remaining parameters mean that the first line contains the names of the columns, the column separator is "|", the character set is Windows ANSI (for DOS - OEM).
    The field type is determined automatically based on the data contained in the column (date, number, string).
    The list of fields does not need to be described anywhere - EXCEL and OLAP will themselves determine which fields are contained in the file by the headings in the first line.

    Attention, check your regional settings "Control Panel" -> "Regional Settings". In my processing, numbers are uploaded with a comma delimiter, and dates are in the format "DD.MM.YYYY".

    When you click the "Generate" button, the data is loaded into the pivot table on the "Base" sheet, and all reports on the "Report" sheets take data from this pivot table.

    I understand that fans of MS SQL Server and powerful databases will begin to grumble that everything is too simplified, that my processing will be exhausted by a year-long sample, but first of all I want to give the benefits of OLAP analysis to medium-sized organizations. I would position this product as an annual analysis tool for wholesale companies, quarterly analysis for retailers, and operational analysis for any organization.

    I had to tinker with VBA so that the data could be taken from a file with any list of fields and I could prepare report forms in advance.

    Description of work in EXCEL (for users):

    Instructions for using reports:
    1. Send the downloaded data for analysis (check with the administrator). To do this, right-click on the folder into which you downloaded data from 1C and select the “Send” command, then “To OLAP analysis in EXCEL 2000”.
    2. Open the file "Motion Analysis.xls"
    3. Select Filter Value; the filters you need can be added on the “Values” tab.
    4. Click the "Generate" button, and the downloaded data will be loaded into EXCEL.
    5. After loading the data into EXCEL, you can view various reports. To do this, just click the "Refresh" button in the selected report. Report sheets begin with Report.
    Attention! After you change the filter value, you need to click the “Generate” button again so that the data in EXCEL is reloaded from the upload file in accordance with the filters.

    Processing from the demo example:

    Processing motionsbuh2011.ert – latest version uploading transactions from Accounting 7.7 for analysis in Excel. It has a checkbox “Attach to file”, which allows you to upload data in parts by period, appending it to the same file, rather than uploading it to the same file again:

    Processing motionswork.ert uploads sales data for analysis in Excel.

    Examples of reports :

    Wiring chess:

    Operator workload by types of invoices:

    P.S. :

    It is clear that a similar scheme can be used to organize the downloading of data from 1C8.
    In 2011, a user contacted me who needed to improve this processing in 1C7 so that it would upload large amounts of data, I found an outsourcer and did the work. So the development is quite relevant.

    The processing of motionsbuh2011.ert has been improved to cope with unloading large amounts of data.

    First clear definition OLAP(On-line Analytical Processing) was proposed in 1993 by E.F. Codd in an article published with the support of Arbor Software (now Hyperion Software). The article included 12 rules that have now become widely known and are described on the website of any OLAP application provider. Later, in 1995, six more lesser known rules were added, all of them were divided into four groups and called "features". Here are the rules that define an OLAP application, with comments from Nigel Pendse, one of the creators of the OLAP Report website.

    The main characteristics of OLAP include:

    1. Multidimensionality of the data model. Few people argue with this statement, and it is considered the main characteristic of OLAP. Part of this requirement is the ability to construct various projections and sections of the model.

    2. Intuitive data manipulation mechanisms. Codd believes that data manipulation should be done using actions directly in the table cell, without the use of menus or complex ones. One might assume that this implies the use of mouse operations, but Codd does not claim this. Many products do not follow this rule. From our point of view, this characteristic has little effect on the quality of the data analysis process. We believe that the program should offer the opportunity to choose a work model, because... not all users like the same things.

    3. Availability. OLAP is the Mediator. Codd specifically emphasizes that the OLAP engine is a middleware program between heterogeneous data sources and the user interface. Most products provide these functions, but the ease of data access is often less than what other software providers would like.

    4. Batch data extraction. This rule requires that products offer both their own databases for storing analyzed data and dynamic (live) access to external data. We agree with Codd on this point and regret that few OLAP products follow it. Even those programs that offer such functions rarely make them easy and sufficiently automated. As a result, Codd supports multidimensional data representation plus partial pre-calculation of large multidimensional databases with transparent end-to-end access to detailed information. Today it is seen as the definition of hybrid OLAP, which is becoming the most popular architecture, so Codd very accurately saw the main trends in this area.

    5. Client-server architecture. Codd believes that not only should every product be a client-server product, but that every server component of an OLAP product should be intelligent enough so that different clients can be connected with minimal effort and programming. This is a much more complex test than a simple client-server architecture and relatively few products pass it. We could argue that this test is perhaps more complex than it needs to be and should not dictate the system architecture to developers.

    6. Transparency. This test is also difficult, but necessary. Full compliance means that the user of, say, a spreadsheet can have full access to the facilities provided by the OLAP engine without even knowing where the data came from. To achieve this, products must provide dynamic access to heterogeneous data sources and a fully functional spreadsheet module. An OLAP server is placed between the spreadsheet and the data warehouse.

    7. Multi-user work. Codd specifies that to be considered a strategic OLAP tool, applications must do more than just read and interpret data, and accordingly they must provide concurrent access (including both data retrieval and update), integrity, and security.

    Special Features

    8. Handling Unnormalized Data. This means integration between the OLAP engine and the non-normalized data source is possible. Codd emphasizes that when updating data performed in an OLAP environment, it must be possible to change non-normalized data in external systems.

    9. Storing OLAP results separately from the source data. In reality, this has to do with the implementation of the product rather than its capabilities, but few would argue with this statement. Essentially, Cobb advocates the widely accepted system that OLAP applications should build analyzes directly on transaction data and that changes to OLAP data should be kept separate from transaction data.

    10. Highlighting missing data. This means that the missing data must be different from the null value. As a rule, all modern OLAP systems support this feature.

    11. Handling missing values. All missing values ​​should be ignored in the analysis, regardless of their source.

    Report generation characteristics

    12. Flexible reporting. The various dimensions must be arranged in any way to suit the user's needs. Most products meet this requirement through dedicated report editors. I wish the same features were available in interactive viewers, but this is much less common. This is one of the reasons why we prefer that the analysis and reporting functionality be combined in one module.

    1. Concept of olap cube

    13. Consistent performance when building reports. This means that system performance when generating reports should not drop significantly as the size or size of the database increases.

    14. Automatic physical layer regulation. The OLAP system must automatically adjust the physical structure to adapt it to the type and structure of the model.

    Dimension control

    15. General functionality. All dimensions should have the same capabilities in structure and functionality.

    16. Unlimited number of dimensions and aggregation levels. In fact, by unlimited number Codd means 15-20, i.e. a number that obviously exceeds the analyst’s maximum needs.

    17. Unlimited operations between data of different measurements. Codd believes that for an application to be called multidimensional, it must support any calculation using data from all dimensions.

    Details about Hyperion products can be found on the website www.hyperion.ru

    print version

    Back

    10.8 Working with pivot tables (PivotTable object)

    Excel.PivotTable object, programmatic work with pivot tables and OLAP cubes in Excel using VBA, PivotCache object, creating a pivot table layout

    During the operation of most enterprises, so-called raw data about activities accumulate. For example, for a trading enterprise, data on sales of goods can be accumulated - for each purchase separately; for cellular communications enterprises - statistics on the load on base stations, etc. Very often, enterprise management needs analytical information that is generated on the basis of raw information - for example, to calculate the contribution of each type of product to the income of the enterprise or the quality of service in the area of ​​​​a given station. It is very difficult to extract such information from raw information: you need to run very complex SQL queries that take a long time to execute and often interfere with the current work. Therefore, increasingly, raw data is now consolidated first into an archival data warehouse - the Data Warehouse, and then into OLAP cubes, which are very convenient for interactive analysis. The easiest way to think of OLAP cubes is as multidimensional tables, in which, instead of the standard two dimensions (columns and rows, as in regular tables), there can be a lot of dimensions. The term "sectional" is commonly used to describe measurements in a cube. For example, the marketing department may need information by time, by region, by product type, by sales channel, etc. Using cubes (as opposed to standard SQL queries), it is very easy to get answers to questions like “how many products of this type were sold in the fourth quarter of last year in the North-West region through regional distributors.

    Of course, such cubes cannot be created in conventional databases. Working with OLAP cubes requires specialized software products. SQL Server comes with an OLAP database from Microsoft called Analysis Services. There are OLAP solutions from Oracle, IBM, Sybase, etc.

    To work with such cubes, Excel has a built-in client.

    In Russian it is called Pivot table(on the graphic screen it is accessible through the menu Data -> Pivot table), and in English - Pivot Table. Accordingly, the object that this client represents is called a PivotTable. It should be noted that it can work not only with OLAP cubes, but also with ordinary data in Excel tables or databases, but many capabilities are lost.

    The PivotTable and PivotTable are software products from Panorama Software that were acquired by Microsoft and integrated into Excel.

    Therefore, working with the PivotTable object is somewhat different from working with other Excel objects. Figuring out what needs to be done is often difficult. Therefore, it is recommended to actively use the macro recorder to receive hints. At the same time, when working with pivot tables, users often have to perform the same repetitive operations, so automation is necessary in many situations.

    What does working with a pivot table look like programmatically?

    The first thing we need to do is create a PivotCache object that will represent a set of records retrieved from the OLAP source. Very roughly, this PivotCache object can be compared to a QueryTable. You can only use one PivotCache object per PivotTable object. A PivotCache object is created using the Add() method of the PivotCaches collection:

    Dim PC1 As PivotCache

    Set PC1 = ActiveWorkbook.PivotCaches.Add(xlExternal)

    PivotCaches is a standard collection, and among the methods that deserve detailed consideration, only the Add() method can be named in it. This method takes two parameters:

    • SourceType- required, defines the type of data source for the pivot table. You can specify the creation of a PivotTable based on a range in Excel, data from a database, an external data source, another PivotTable, etc. In practice, it usually makes sense to use OLAP only when there is a lot of data - accordingly, you need specialized external storage (for example, Microsoft Analysis Services). In this situation, the xlExternal value is selected.
    • SourceData- required in all cases, except when the value of the first parameter is xlExternal. In fact, it defines the data range on the basis of which the PivotTable will be created. Typically takes a Range object.

    The next task is to configure the PivotCache object settings. As already mentioned, this object is very similar to QueryTable, and its set of properties and methods is very similar. Some of the most important properties and methods:

    • ADOConnection- the ability to return an ADO Connection object that is automatically created to connect to an external data source. Used to further configure connection properties.
    • Connection- works exactly the same as the QueryTable object property of the same name. Can accept a connection string, a ready-made Recordset object, a text file, or a Web request. Microsoft Query file. Most often, when working with OLAP, the connection string is written directly (since getting a Recordset object, for example, to change data, does not make much sense - OLAP data sources are almost always read-only). For example, setting this property to connect to the Foodmart database (Analysis Services sample database) on the LONDON server might look like this:

    PC1.Connection = "OLEDB;Provider=MSOLAP.2;Data Source=LONDON1;Initial Catalog = FoodMart 2000"

    • properties CommandType And CommandText they also describe the type of command that is sent to the database server and the text of the command itself. For example, to access the Sales cube and get it entirely into the cache on the client, you can use code like

    PC1.CommandType = xlCmdCube

    PC1.CommandText = Array("Sales")

    • property LocalConnection allows you to connect to a local cube (*.cub file) created using Excel. Of course, it is not recommended to use such files for working with “production” volumes of data - only for the purpose of creating layouts, etc.
    • property MemoryUsed returns the amount of RAM used by PivotCache. If a PivotTable based on this PivotCache has not yet been created and opened, it returns 0. Can be used for checks if your application will run on weak clients.
    • property OLAP returns True if PivotCache is connected to the OLAP server.
    • OptimizeCache- ability to optimize the cache structure. The initial data download will take longer, but then the speed may increase. Doesn't work for OLE DB sources.

    The remaining properties of the PivotCache object are the same as those of the QueryTable object, and therefore will not be discussed here.

    The main method of the PivotCache object is the CreatePivotTable() method. Using this method, the next stage is performed - creating a pivot table (PivotTable object). This method takes four parameters:

    • TableDestination- the only required parameter.

      Accepts a Range object in the upper left corner of which the pivot table will be placed.

    • TableName- name of the pivot table. If not specified, the view name “PivotTable1” will be automatically generated.
    • ReadData- if set to True, then the entire contents of the cube will be automatically placed in the cache. You need to be very careful with this parameter, since using it incorrectly can dramatically increase the load on the client.
    • DefaultVersion- this property is usually not specified. Allows you to determine the version of the pivot table being created. By default, the most recent version is used.

    Creating a pivot table in the first cell of the first worksheet might look like this:

    PC1.CreatePivotTable Range("A1")

    We have created a pivot table, but immediately after creation it is empty. It provides four areas in which you can place fields from the source (on the graphical screen all this can be configured either using the window List of Pivot Table Fields- it opens automatically or using a button Layout on the last screen of the PivotTable Wizard):

    • column area- it contains those dimensions (“section” in which the data will be analyzed) whose members are smaller;
    • line area- those dimensions which have more members;
    • page area- those measurements for which you only need to filter (for example, show data only for such and such a region or only for such and such a year);
    • data area- in fact, the central part of the table. Those numerical data (for example, the amount of sales) that we analyze.

    Relying on the user to place elements correctly in all four areas is difficult.

    Additionally, this may take some time. Therefore, it is often necessary to arrange data in a pivot table programmatically. This operation is performed using the CubeField object. The main property of this object is Orientation, it determines where this or that field will be located. For example, let's place the Customers dimension in the columns area:

    PT1.CubeFields("").Orientation = xlColumnField

    Then - the Time measurement into the line area:

    PT1.CubeFields("").Orientation = xlRowField

    Then - the Product dimension to the page area:

    PT1.CubeFields("").Orientation = xlPageField

    And finally, the indicator (numeric data for analysis) Unit Sales:

    PT1.CubeFields(“.”).Orientation = xlDataField

    Now the pivot table has been created and you can work with it. However, it is often necessary to perform one more operation - to expand the desired level of the dimension hierarchy. For example, if we are interested in quarterly analysis, then we need to expand the Quarter level of the Time dimension (by default, only the top level is shown). Of course, the user can do this on their own, but they can't always be counted on to figure out where to click. For example, you can programmatically expand the hierarchy of the Time dimension to the quarter level for 1997 using the PivotField and PivotItem objects:

    PT1.PivotFields(“.”).PivotItems(“.”).DrilledDown = True

    / In a cubist manner. Application of OLAP cubes in management practice of large companies


    In contact with

    Classmates

    Konstantin Tokmachev, system architect

    In a cubist style.
    Application of OLAP cubes in management practice of large companies

    Perhaps the time has passed when a corporation's computing resources were spent only on recording information and accounting reports. At the same time, management decisions were made “by eye” in offices, at meetings and sessions. Perhaps in Russia it’s time to return corporate computing systems to their main resource - solving management problems based on data registered in the computer

    About the benefits of business analytics

    In the corporate management loop, between the “raw” data and the “levers” of influencing the managed object, there are “performance indicators” - KPIs. They form a kind of “dashboard”, reflecting the state of various subsystems of the controlled object. Equipping the company with informative performance indicators and monitoring their calculation and obtained values ​​is the work of a business analyst. Automated analysis services, such as the MS SQL Server Analysis Services (SSAS) utility and its main tool, the OLAP cube, can provide significant assistance in organizing the corporation’s analytical work.

    One more point needs to be made right here. Let's say, in the American tradition, a specialty focused on working with OLAP cubes is called BI (Business Intelligence). There should be no illusions that the American BI corresponds to the Russian “business analyst”. No offense, but often our business analyst is an “under-accountant” and “under-programmer”, a specialist with vague knowledge and a small salary, who really does not have any of his own tools and methodology.

    A BI specialist is, in fact, an applied mathematician, a highly qualified specialist who puts modern mathematical methods into the company’s arsenal (what was called Operations Research - methods of operations research). BI is more consistent with the specialty “system analyst” that was once in the USSR, graduated from the Faculty of Computational Mathematics and Mathematics of Moscow State University. M.V. Lomonosov. The OLAP cube and analysis services can become a promising basis for the workplace of a Russian business analyst, perhaps after some advanced training in the direction of American BI.

    Recently, another harmful trend has emerged. Thanks to specialization, mutual understanding between different categories of corporation employees has been lost. An accountant, manager and programmer, like “a swan, a crayfish and a pike” in I.A.’s fable. Krylov, are pulling the corporation in different directions.

    The accountant is busy with reporting; his amounts, both in meaning and in dynamics, are not directly related to the business process of the company.

    The manager is busy with his part of the business process, but is not able to evaluate globally, at the level of the company as a whole, the results and prospects of his actions.

    Finally, the programmer, who was once (thanks to his education) a conductor of advanced technical ideas from the sphere of science to the sphere of business, has turned into a passive executor of the fantasies of the accountant and manager, so it is no longer uncommon for the IT departments of corporations to be driven by accountants and, in general, everyone to whom not lazy. A lack of initiative, illiterate, but relatively highly paid 1C programmer is a real scourge of Russian corporations. (Almost like a domestic football player.) I’m not even talking about the so-called “economists and lawyers”; everything has been said about them a long time ago.

    So, the position of a business analyst, equipped with a knowledge-intensive SSAS apparatus, proficient in the basics of programming and accounting, is capable of consolidating the work of the company in relation to the analysis and forecast of the business process.

    Advantages of OLAP cubes

    OLAP cube is modern remedy analysis of the corporate computer system database, which makes it possible to provide employees at all levels of the hierarchy with the required set of indicators that characterize the company’s production process. The point is not only that the convenient interface and flexible query language for the MDX cube (MultiDimensional eXpressions) allow you to formulate and calculate the necessary analytical indicators, but the remarkable speed and ease with which the OLAP cube does this. Moreover, this speed and ease, within certain limits, do not depend on the complexity of calculations and the size of the database.

    Some introduction to OLAP-
    cube can be given by a “pivot table” of MS Excel. These objects have similar logic and similar interfaces. But, as will be seen from the article, OLAP functionality is incomparably richer, and performance is incomparably higher, so the “pivot table” remains a local desktop product, while OLAP is an enterprise-level product.

    Why is the OLAP cube so well suited for solving analytical problems? The OLAP cube is designed in such a way that all indicators in all possible sections are pre-calculated (in whole or in part), and the user can only “pull out” the required indicators (measures) and dimensions (dimensions) with the mouse, and the program can redraw the tables.

    All possible analytics in all sections form one huge field, or rather, not a field, but just a multidimensional OLAP cube. Whatever request the user (manager, business analyst, executive) turns to the analytics service, the speed of response is explained by two things: firstly, the required analytics can be easily formulated (either selected from a list by name, or specified by a formula in the MDX language ), secondly, as a rule, it has already been calculated.

    The formulation of analytics is possible in three options: it is either a database field (or rather, a warehouse field), or a calculation field defined at the cube design level, or an MDX language expression when working interactively with the cube.

    This means several attractive features of OLAP cubes. Essentially, the barrier between the user and the data disappears. The barrier is in the form of an application programmer, who, firstly, needs to explain the problem (set a task). Secondly, you will have to wait for the application programmer to create an algorithm, write and debug the program, and then possibly modify it. If there are many employees and their requirements are varied and changeable, then a whole team of application programmers is needed. In this sense, an OLAP cube (and a qualified business analyst) replaces an entire team of application programmers in terms of analytical work, just as a powerful excavator with an excavator operator replaces an entire team of migrant workers with shovels when digging a ditch!

    At the same time, another very important quality of the obtained analytical data is achieved. Since there is only one OLAP cube for the entire company, i.e. This is the same field with analysts for everyone, which eliminates annoying discrepancies in the data. When a manager has to ask the same task to several independent employees in order to eliminate the factor of subjectivity, but they still bring different answers, which everyone undertakes to explain somehow, etc. The OLAP cube ensures uniformity of analytical data at different levels of the corporate hierarchy, i.e. if a manager wants to detail a certain indicator of interest to him, then he will certainly come to the lower-level data with which his subordinate works, and this will be precisely the data on the basis of which the higher-level indicator was calculated, and not some other data, received in some other way, at some other time, etc. That is, the entire company sees the same analytics, but at different levels of aggregation.

    Let's give an example. Let's say a manager controls accounts receivable. As long as the KPI for overdue receivables is green, it means everything is normal and no management actions are required. If the color has changed to yellow or red, something is wrong: we cut the KPIs by sales departments and immediately see the departments “in red”. The next section by managers - and the seller whose clients are behind on payments is identified. (Further, the overdue amount can be divided by customers, by terms, etc.) The head of the corporation can directly contact the violators at any level. But in general, the same KPI (at their hierarchy levels) is seen by both department heads and sales managers. Therefore, in order to correct the situation, they don’t even need to wait for a “call on the carpet”... Of course, the KPI itself does not necessarily have to be the amount of overdue payments - it can be the weighted average period of overdue payments or, in general, the rate of turnover of receivables.

    Let us note that the complexity and flexibility of the MDX language, together with the fast (sometimes instantaneous) results, allows us to solve (taking into account the stages of development and debugging) complex control problems that otherwise might not have been posed at all due to the complexity for application programmers and initial uncertainty in the formulation. (Lengthy deadlines for application programmers to solve analytical problems due to poorly understood formulations and long modifications of programs when conditions change are often encountered in practice.)

    Let us also pay attention to the fact that each employee of the company can collect from the general field an OLAP analyst exactly the harvest that he needs for his work, and not be content with the “strip” that is cut out for him in communal “standard reports”.

    The multi-user interface for working with an OLAP cube in client-server mode allows each employee, independently of others, to have their own (even self-made with some skill) analytics blocks (reports), which, once defined, are automatically updated - in other words, they are always up to date condition.

    That is, the OLAP cube allows you to make analytical work (which is actually carried out not only by reception analysts, but, in fact, by almost all employees of the company, even logisticians and managers who control balances and shipments) more selective, “not in general terms” , which creates conditions for improving work and increasing productivity.

    To summarize our introduction, we note that the use of OLAP cubes can raise the management of a company to a higher level. The uniformity of analytical data at all levels of the hierarchy, their reliability, complexity, ease of creating and modifying indicators, individual settings, high speed of data processing, and finally, saving money and time spent on supporting alternative analytical paths (application programmers, employee’s independent calculations) open up prospects for the use of OLAP cubes in the practice of large Russian companies.

    OLTP + OLAP: feedback loop in the company management chain

    Now let's look at the general idea of ​​OLAP cubes and their point of application in the corporate management chain. The term OLAP (OnLine Analytical Processing) was introduced by the British mathematician Edgar Codd in addition to his previously introduced term OLTP (OnLine Transactions Processing). This will be discussed later, but E. Codd, of course, proposed not only the terms, but also the mathematical theories of OLTP and OLAP. Without going into details, in the modern interpretation, OLTP is a relational database, considered as a mechanism for recording, storing and retrieving information.

    Solution methodology

    ERP systems (Enterprice Resource Planning), such as 1C7, 1C8, MS Dynamics AX, have user-oriented software interfaces (entering and editing documents, etc.) and a relational database (DB) for storing and retrieving information , represented today by software products such as MS SQL Server (SS).

    Note that the information registered in the ERP system database is indeed a very valuable resource. The point is not only that the registered information ensures the current document flow of the corporation (extracting documents, their correction, the ability to print and reconcile, etc.) and not only the possibility of calculation financial statements(taxes, audit, etc.). From a management point of view, it is much more important that the OLTP system (relational database) is, in fact, an actual life-size digital model of the corporation’s activities.

    But to manage the process, it is not enough to register information about it. The process should be presented in the form of a system of numerical indicators (KPIs) characterizing its progress. In addition, acceptable ranges of values ​​must be defined for indicators. And only if the value of the indicator falls outside the permissible interval, a control action should follow.

    Regarding such a logic (or mythology) of control (“control by deviation”), both ancient Greek philosopher Plato, who created the image of the helmsman (cybernose), who leans on the oar when the boat deviates from the course, and the American mathematician Norbert Wiener, who created the science of cybernetics on the eve of the computer era.

    In addition to the usual system for recording information using the OLTP method, another system is needed - a system for analyzing the collected information. This add-on, which in the control loop plays the role of feedback between management and the control object, is an OLAP system or, in short, an OLAP cube.

    As a software implementation of OLAP, we will consider the MS Analysis Services utility, which is part of the standard delivery of MS SQL Server, abbreviated SSAS. Note that, according to E. Codd’s plan, the OLAP cube in analytics should give the same comprehensive freedom of action that the OLTP system and the relational database (SQL Server) provide in storing and retrieving information.

    OLAP Logistics

    Now let's look at the specific configuration of external devices, application programs and technological operations on which the automated operation of the OLAP cube is based.

    We will assume that the corporation uses an ERP system, for example, 1C7 or 1C8, within which information is recorded as usual. The database of this ERP system is located on a certain server and is supported by MS SQL Server.

    We will also assume that another server has software installed, including MS SQL Server with the MS Analysis Services (SSAS) utility, as well as MS SQL Server Management Studio, MS C#, MS Excel and MS Visual Studio. These programs together form the required context: the tools and necessary interfaces for the developer of OLAP cubes.

    The SSAS server has a freely distributed program called blat, called (with parameters) from command line and providing postal service.

    At employee workstations, within local network, among other things, MS Excel programs (versions no less than 2003) are installed, as well as, possibly, a special driver to ensure that MS Excel works with MS Analysis Services (unless the corresponding driver is already included in MS Excel).

    For definiteness, we will assume that employee workstations have operating system Windows XP, and on servers - Windows Server 2008. In addition, let MS SQL Server 2005 be used as the SQL Server, with Enterprise Edition (EE) or Developer Edition (DE) installed on the server with the OLAP cube. In these editions it is possible to use the so-called. “semi-additive measures”, i.e. additional aggregate functions (statistics) other than ordinary sums (for example, extremum or average).

    OLAP cube design (OLAP cubism)

    Let's say a few words about the design of the OLAP cube itself. In the language of statistics, an OLAP cube is a set of performance indicators calculated in all necessary sections, for example, the shipment indicator in sections by customers, by goods, by dates, etc. Due to direct translation from English in Russian literature on OLAP cubes, indicators are called “measures”, and sections are called “dimensions”. This is a mathematically correct, but syntactically and semantically not very successful translation. The Russian words “measure”, “dimension”, “dimension” are almost the same in meaning and spelling, while the English “measure” and “dimension” are different in both spelling and meaning. Therefore, we give preference to the traditional Russian statistical terms “indicator” and “cut”, which are similar in meaning.

    There are several options for software implementation of an OLAP cube in relation to the OLTP system where data is recorded. We will consider only one scheme, the simplest, most reliable and fastest.

    In this design, OLAP and OLTP do not share tables, and OLAP analytics are calculated in as much detail as possible during the cube update (Process) stage, which precedes the usage stage. This scheme is called MOLAP (Multidimensional OLAP). Its disadvantages are asynchrony with ERP and high memory costs.

    Although formally an OLAP cube can be built using all (thousands) of ERP system relational database tables as a data source and all (hundreds) of their fields as indicators or sections, in reality this should not be done. Vice versa. To load into a cube, it is more correct to prepare a separate database, called a “showcase” or “warehouse”.

    Several reasons force us to do this.

    • Firstly, binding OLAP cube to tables real base data will certainly create technical problems. Changing data in a table can trigger a refresh of the cube, and refreshing a cube is not necessarily a fast process, so the cube will be in a state of constant rebuilding; At the same time, the cube update procedure can block (when reading) the data of the database tables, slowing down the work of users in registering data in the ERP system.
    • Secondly, Having too many indicators and cuts will dramatically increase the storage area of ​​the cube on the server. Let's not forget that the OLAP cube stores not only the source data, as in the OLTP system, but also all the indicators summed up over all possible sections (and even all combinations of all sections). In addition, the speed of updating the cube and, ultimately, the speed of building and updating analytics and user reports based on them will slow down accordingly.
    • Third, too many fields (indicators and sections) will create problems in the OLAP developer interface, because the lists of elements will become immense.
    • Fourthly, The OLAP cube is very sensitive to data integrity violations. The cube cannot be built if the key data is not located at the link specified in the structure of the cube field connections. Temporary or permanent integrity violations, empty fields are common in an ERP system database, but this is absolutely not suitable for OLAP.

    You can also add that the ERP system and the OLAP cube should be located on different servers to share the load. But then, if there are common tables for OLAP and OLTP, the problem of network traffic also arises. Practically insoluble problems arise in this case when it is necessary to consolidate several disparate ERP systems (1C7, 1C8, MS Dynamics AX) into one OLAP cube.

    Probably, we can continue to pile up technical problems. But most importantly, remember that, unlike OLTP, OLAP is not a means of recording and storing data, but an analytics tool. This means that there is no need to upload and download “dirty” data from ERP to OLAP “just in case.” On the contrary, you must first develop a concept for managing the company, at least at the level of the KPI system, and then design an application data warehouse (warehouse), located on the same server as the OLAP cube, and containing a small, refined amount of data from ERP necessary for management .

    Without promoting bad habits, the OLAP cube in relation to OLTP can be likened to the well-known “still cube”, through which the “pure product” is extracted from the “fermented mass” of the real registration.

    So, we got that the data source for OLAP is a special database (warehouse), located on the same server as OLAP. Generally this means two things. First, there must be special procedures that will create a warehouse from ERP databases. Secondly, the OLAP cube is asynchronous with its ERP systems.

    Taking into account the above, we propose the following version of the computing process architecture.

    Solution architecture

    Suppose there are many ERP systems of a certain corporation (holding) located on different servers, the analytical data for which we would like to see consolidated within one OLAP cube. We emphasize that in the technology described, we combine data from ERP systems at the warehouse level, leaving the design of the OLAP cube unchanged.

    On the OLAP server we create images (blank copies) of the databases of all these ERP systems. We periodically (nightly) perform partial replication of the corresponding active ERP databases onto these empty copies.

    Next, SP (stored procedure) is launched, which, on the same OLAP server without network traffic, based on partial replicas of ERP system databases, creates (or replenishes) a warehouse (warehouse) - the data source of the OLAP cube.

    Then the standard procedure for updating/building a cube based on warehouse data is launched (Process operation in the SSAS interface).

    Let's comment on some aspects of the technology. What kind of work do SPs do?

    As a result of partial replication, current data appears in the image of some ERP system on the OLAP server. By the way, partial replication can be performed in two ways.

    Firstly, from all the tables in the ERP system database, during partial replication, only those that are needed to build a warehouse are copied. This is controlled by a fixed list of table names.

    Secondly, partial replication may also mean that not all fields of the table are copied, but only those that are involved in building the warehouse. The list of fields to copy is either specified or dynamically created in SP in the image of the copy (if not all fields are initially present in the copy of the table).

    Of course, it is possible not to copy entire table rows, but only to add new records. However, this creates serious inconveniences when accounting for ERP revisions “retroactively,” which is often the case in real-life systems. So it’s easier, without further ado, to copy all records (or update the “tail” starting from a certain date).

    Next, the main task of SP is to convert ERP system data to warehouse format. If there is only one ERP system, then the task of conversion mainly comes down to copying and possibly reformatting the necessary data. But if you need to consolidate several ERP systems in the same OLAP cube different structures, then the transformations become more complicated.

    The task of consolidating several different ERP systems in a cube is especially difficult if the sets of their objects (directories of goods, contractors, warehouses, etc.) partially overlap, the objects have the same meaning, but are naturally described differently in the directories different systems(in the sense of codes, identifiers, names, etc.).

    In reality, such a picture arises in a large holding company, when several of its constituent autonomous companies of the same type carry out approximately the same types of activities in approximately the same territory, but use their own and non-agreed registration systems. In this case, when consolidating data at the warehouse level, you cannot do without auxiliary mapping tables.

    Let's pay some attention to the warehouse storage architecture. Typically, an OLAP cube schema is represented in the form of a “star”, i.e. as a data table surrounded by “rays” of directories - tables of secondary key values. A table is a block of “indicators”; reference books are their sections. In this case, the directory, in turn, can be an arbitrary unbalanced tree or a balanced hierarchy, for example, a multi-level classification of goods or contractors. In an OLAP cube, the numeric fields of a data table from a warehouse automatically become “indicators” (or measures), and sections (or dimensions) can be defined using secondary key tables.

    This is a visual “pedagogical” description. In fact, the architecture of an OLAP cube can be much more complex.

    Firstly, a warehouse can consist of several “stars”, possibly connected through common directories. In this case, the OLAP cube will be a union of several cubes (several data blocks).

    Secondly, the “ray” of an asterisk can be not just one directory, but an entire (hierarchical) file system.

    Thirdly, on the basis of existing dimension sections, new hierarchical sections can be defined using the OLAP developer interface tools (say, with fewer levels, with a different order of levels, etc.)

    Fourthly, based on existing indicators and sections, using MDX language expressions, new indicators (calculations) can be defined. It is important to note that new cubes, new indicators, new sections are automatically fully integrated with the original elements. It should also be noted that poorly formulated calculations and hierarchical sections can significantly slow down the operation of an OLAP cube.

    MS Excel as an interface with OLAP

    Of particular interest is the user interface with OLAP cubes. Naturally, the most complete interface is provided by the SSAS utility itself. This includes an OLAP cube developer toolkit, an interactive report designer, and a window for interactive work with an OLAP cube using queries in the MDX language.

    In addition to SSAS itself, there are many programs that provide an interface to OLAP, covering their functionality to a greater or lesser extent. But among them there is one, which, in our opinion, has undeniable advantages. This is MS Excel.

    The interface with MS Excel is provided by a special driver, downloadable separately or included in the Excel distribution. It does not cover all the functionality of OLAP, but with the growth of MS Excel version numbers, this coverage is becoming wider (for example, a graphical representation of KPI appears in MS Excel 2007, which was not the case in MS Excel 2003, etc.).

    Of course, in addition to its fairly complete functionality, the main advantage of MS Excel is the widespread distribution of this program and the close familiarity with it of the overwhelming number of office users. In this sense, unlike other interface programs, the company does not need to purchase anything additional and does not need to train anyone additionally.

    The great advantage of MS Excel as an interface with OLAP is the ability to further independently process the data obtained in the OLAP report (i.e., continue to study data obtained from OLAP on other sheets of the same Excel, no longer using OLAP tools, but using regular Excel tools).

    Facubi nightly treatment cycle

    Now we will describe the daily (nightly) computational cycle of OLAP operation. The calculation is carried out under the control of the facubi program, written in C# 2005 and launched via Task Scheduler on a server with warehouse and SSAS. At the beginning, facubi goes to the Internet and reads current exchange rates (used to represent a number of indicators in a currency). Next, perform the following steps.

    First, facubi launches SPs that perform partial replication of the databases of various ERP systems (holding elements) available on the local network. Replication is performed, as we said, to pre-prepared “backgrounds” - images of remote ERP systems located on the SSAS server.

    Secondly, through SP, a mapping is performed from ERP replicas to the warehouse storage - a special DB, which is the source of OLAP cube data and located on the SSAS server. In this case, three main tasks are solved:

    • ERP data adjusted to the required cube formats; We are talking about both tables and table fields. (Sometimes the required table needs to be “fashioned,” say, from several MS Excel sheets.) Similar data may have different formats in different ERPs, for example, key ID fields in 1C7 directories have a 36-digit character code of length 8, and _idrref fields in directories 1С8 – hexadecimal numbers of length 32;
    • during processing logical data control is carried out (including writing defaults in place of missing data, where possible) and integrity control, i.e. checking the presence of primary and secondary keys in the corresponding classifiers;
    • code consolidation objects that have the same meaning in different ERPs. For example, the corresponding elements of directories of different ERPs may have the same meaning, say, they are the same counterparty. The problem of consolidating codes is solved by constructing mapping tables, where different codes of the same objects are brought to unity.

    Thirdly, facubi launches the standard procedure for updating Process cube data (from the procedures of the SSAS utility).

    Based on the checklists, facubi sends emails about the progress of processing steps.

    After executing facubi, Task Scheduler launches several excel files in turn, in which reports are pre-created based on OLAP cube indicators. As we said, MS Excel has a special software interface (separately downloadable or built-in driver) for working with OLAP cubes (with SSAS). When you start MS Excel, MS VBA programs (such as macros) are activated, which ensure that data in reports is updated; reports are modified if necessary and sent by mail (blat program) to users according to checklists.

    Local network users with access to the SSAS server will receive “live” reports configured for the OLAP cube. (In principle, they themselves, without any mail, can update OLAP reports in MS Excel located on their local computers.) Users outside the local network will either receive the original reports, but with limited functionality, or for them (after updating the OLAP reports in MS Excel) special “dead” reports that do not access the SSAS server will be calculated.

    Evaluation of results

    We talked above about the asynchrony of OLTP and OLAP. In the technology variant under consideration, the OLAP cube update cycle is performed at night (say, it starts at 1 am). This means that in the current working day, users are working with yesterday's data. Since OLAP is not a recording tool (look at the latest revision of the document), but a management tool (understand the trend of the process), such a lag is usually not critical. However, if necessary, even in the described version of the cube architecture (MOLAP), the update can be carried out several times a day.

    The execution time of update procedures depends on the design features of the OLAP cube (more or less complexity, more or less successful definitions of indicators and sections) and on the volume of databases of external OLTP systems. According to experience, the warehouse construction procedure takes from several minutes to two hours, the cube update procedure (Process) takes from 1 to 20 minutes. We are talking about complex OLAP cubes that unite dozens of star-type structures, dozens of common “rays” (reference sections) for them, and hundreds of indicators. Estimating the volume of databases of external ERP systems based on shipping documents, we are talking about hundreds of thousands of documents and, accordingly, millions of product lines per year. The historical processing depth of interest to the user was three to five years.

    The described technology has been used in a number of large corporations: since 2008 in the Russian Fish Company (RRK) and the Russian Sea company (RM), since 2012 in the Santa Bremor company (SB). Some corporations are primarily trading and purchasing firms (PPCs), others are production companies (fish and seafood processing plants in the Republic of Moldova and the Republic of Belarus). All corporations are large holdings, uniting several companies with independent and various computer accounting systems - ranging from standard ERP systems such as 1C7 and 1C8 to “relic” accounting systems based on DBF and Excel. I will add that the described technology for operating OLAP cubes (without taking into account the development stage) either does not require special employees at all, or is the responsibility of one full-time business analyst. The task has been running automatically for years, providing various categories of corporate employees with up-to-date reporting on a daily basis.

    Pros and cons of the solution

    Experience shows that the proposed solution is quite reliable and easy to use. It is easily modified (connection/disconnection of new ERPs, creation of new indicators and sections, creation and modification of Excel reports and their mailing lists) with the invariance of the facubi control program.

    MS Excel as an interface with OLAP provides sufficient expressiveness and allows different categories of office employees to quickly become familiar with OLAP technology. The user receives daily “standard” OLAP reports; using the MS Excel interface with OLAP, can independently create OLAP reports in MS Excel. In addition, the user can independently continue to study the information of OLAP reports using the usual capabilities of his MS Excel.

    The “refined” warehouse database, in which several heterogeneous ERP systems are consolidated (during the construction of the cube), even without any OLAP, allows you to solve (on an SSAS server, using the query method in Transact SQL or the SP method, etc.) many applied management problems. Let us recall that the warehouse database structure is unified and much simpler (in terms of the number of tables and the number of table fields) than the database structures of the original ERP.

    We especially note that in our proposed solution there is the possibility of consolidating various ERP systems in one OLAP cube. This allows you to obtain analytics for the entire holding and maintain long-term continuity in analytics when a corporation moves to another accounting ERP system, say, when moving from 1C7 to 1C8.

    We used the MOLAP cube model. The advantages of this model are reliability in operation and high speed of processing user requests. Disadvantages: OLAP and OLTP are asynchronous, as well as large amounts of memory for storing OLAP.

    In conclusion, here is another argument in favor of OLAP that might have been more appropriate in the Middle Ages. Because its evidentiary power rests on authority. A modest, clearly underrated British mathematician E. Codd developed the theory of relational databases in the late 60s. The power of this theory was such that now, after 50 years, it is already difficult to find a non-relational database and a database query language other than SQL.

    OLTP technology, based on the theory of relational databases, was the first idea of ​​E. Codd. In fact, the concept of OLAP cubes is his second idea, expressed by him in the early 90s. Even without being a mathematician, you can quite expect that the second idea will be as effective as the first. That is, in terms of computer analytics, OLAP ideas will soon take over the world and displace all others. Simply because the topic of analytics finds its comprehensive mathematical solution in OLAP, and this solution is “adequate” (B. Spinoza’s term) to the practical problem of analytics. “Adequately” means in Spinoza that God himself could not have thought of anything better...

    1. Larson B. Development of business analytics in Microsoft SQL Server 2005. – St. Petersburg: “Peter”, 2008.
    2. Codd E. Relational Completeness of Data Base Sublanguages, Data Base Systems, Courant Computer Science Sumposia Series 1972, v. 6, Englwood cliffs, N.Y., Prentice – Hall.

    In contact with