Mapping an Pivot Query in EntityFramework6

At work I had a challenging problem in the last days (in fact I like challenging problems and they’re not that rare which is why I like my job).

This time the problem was this:

  1. Let T be a table with columns C and X.
  2. Let the column E be of an enum type with a short, static defined values E1, E2, E3.
  3. Let the column X be a space separated list of tokens (values look like „ABC“, „ABC DEF“, „XUT OSH OJR“, with an arbitrary number of tokens).
  4. We look for the following information:
    For a given token x in the values of X get a tuple R = (#(E1), #(E2), #(E3)), where #(Ei) denotes the number of rows in T, where x is included in X and the row has value Ei in column E.

The following facts hold for the environment:

  1. The length of the values in X and the size of the table T can be quite big.
  2. Database latency and connection bandwidth are limited, so the query should not return more data than necessary, at best, only the tuple R is transferred.
  3. Runtime of the query itself on the database server is an issue as well, that should be kept in mind.
  4. Changed in the database scheme are not acceptable as it’s part of the products feature that should be rolled out without the need of a database migration.

SQL query approaches

I started experimenting with pure SQL first, and the best query I found was built like this:

SELECT * FROM
  (SELECT E, COUNT(*) as cnt 
   FROM 
     (SELECT E 
      FROM
        SELECT E 
        FROM ORDERS
        WHERE PLNT_ID = :p1
          AND INSTR(X, :p2) > 0))
        GROUP BY TIMING)
  PIVOT (SUM(cnt) FOR E IN('E1', 'E2', 'E3'));

Four nested Select statements, a group by, aggregation, pivot – (nearly) nothing missing in this query.

From the database side alone I know there are more simple ways to achieve the same, especially when it’s possible to change the schema the data is stored in or when it’s possible to use views and things like that.

I was constrainerd to a pure SQL query. Within that constraint I tried the following alternatives:

  • Instead of Pivot, use the same subquery three times to get the individual target columns E1, E2 and E3. Unfortunately the Database servers query planning fails to do that only once and instead performed the full table scan on X three times, taking more than 20 seconds to complete the query.
  • Use a nested Select query to produce a intermediate result of tuples (E, x), that contains one tuple for any row (E, X) in T and any token x in X. I cancelled the test query after around 15 minutes (!), as it was too slow obviously.

In the end the query above was the fastest variant I could come up with. It returns the optimal result set: A single row (E1, E2, E3) with the required numbers.

Cannot be mapped

I took this query and tried to include it in my C# code using EF6:

using (DBContext context = GetDbContext()) {
    var result = context.Database
        .SqlQuery<MyEntity>(query, p1, p2).Single();
}

GetDbContext() is a factory returning the DbContext, and context.Database.SqlQuery<> sends the raw SQL query to the database. Generic argument MyEntity is a POCO with properties E1, E2 and E3 as EF6 requires this type to have properties exactly matching type and name of the query result.

Taking only one row using Single() is sufficient here as by design of the query it’s guaranteed to return exactly one tuple.

The PIVOT command in (Oracle) SQL

SELECT E, cnt FROM [...] sourceTable
PIVOT (SUM(cnt) FOR E IN('E1', 'E2', 'E3')

It roughly works as follows: it turns the tuple set of the select column, like

(E1, 200)
(E2, 240)
(E3, 5)

into columns and their values, which results in

E1    E2    E3
200   240   5

The columns are determined by the „IN“ clause, so in this example the columns will be E1, E2 and E3. As in general it’s not guaranteed that that value occurs only once, that one requires an aggregate function, SUM in the example. For

(E1, 200)
(E2, 240)
(E3, 5)
(E2, 400)

this would return

E1    E2    E3
200   640   5

More interesting here is: if one of the values is missing, the column is never created, so

(E1, 200)
(E2, 240)

evaluates to

E1    E2
200   240

This example shows that it’s impossible to statically define a fixed superset of the expected result column set (the missing E3 field is due to the missing row in the source table).

On the other hand, the PIVOT command allows to take a dynamic field in the IN-clause. With that the set of columns is completely derived from the input dataset.

Tricking EF6 to map Pivot

EF6 refuses to map the result at all, even in cases where the column set exactly matches the expected one, but alternative formulation of my query needed more resources or a completely different mapping, so what to do?

Here the internal nature of SQLs UNION statement enters the stage. Although the relational algebra doesn’t guarantee this, in practise all Database systems I know take the first argument of a union statement as the reference statement, that defines the format of the result set, and obviously this principle works for Oracle 11g Databases and pivot as well.

Finally my query looks like this:

SELECT * FROM 
  (SELECT -1 as E1, -1 as E2, -1 AS E3 FROM dual)
WHERE E1 >= 0
UNION
SELECT * FROM
  (SELECT E, COUNT(*) as cnt
   FROM
     (SELECT E
      FROM
        SELECT E
        FROM ORDERS
        WHERE PLNT_ID = :p1
          AND INSTR(X, :p2) > 0))
        GROUP BY TIMING)
  PIVOT (SUM(cnt) FOR E IN('E1', 'E2', 'E3'));

The bold part is the tricky addition to the query above. It selects an empty set of tuples from "dual" (the Oracle Sql Dummy table), defining the required fixed column set.

Now mapping with EF6 is possible. If one of the values is missing as in Example x, the union operation takes it as null. So it’s even easy to distinguish between „no tuple of value E2 existed“ and „the aggregation returned 0“, something that may be relevant when using another aggregation function than COUNT in the inner Query, where 0 is a valid aggregation result after the pivot.

Towards a general Implementation

Extending EntityFramework 6 is quite a challenge, if possible at all. I can imagine a common pattern to wrap a given raw SQL query to be mappable to a given entity, but to keep that valid across Databases promises to be challenging at least.

Another approach could be to utilize reading the data with a DataReader, mapping the object „by hand“ in the new method.

Anyways: I’m not sure if it’s worth it, as these type of queries is rare, and given that EF Core is starting to be useable, it may not be worth to put much additional work on EF6.

For EF Core I can imagine another API design to support queries like this. I’m going to write a separate article to sketch this idea.

If you’re interested, to work on something like that, leave me a comment or ping me on twitter (@jongleur1983 or on GitHub (@jongleur1983).

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.