Adding External Images to a RDL(C) Report

There are 3 options for adding images to a report:

  1. Internal image (copy the image in the RDL file itself).
  2. Load image bytes from DB.
  3. Reference External Images referenced with an url

For External Images the samples shows only urls like this: http://<servername>/images/image1.jpg.

If your ReportViewer control is rendering locally and your images just sit on the same drive as the application that embeds the ReportViewer control, then you can reference them also like this:

File://C:\\Path\\To\\My File\\Logo.jpg

Report External Image URL

If you write the url like this (normal?)


A browser pointed to the above url will happily show your image (Chrome), but your report will not!

(I’m using Report Builder 3.0, SQL Serevr 2012).

Tagged with:
Posted in software-development

Reports in C#. Free tools suitable for commercial projects.

  • Free as in “free beer”. Free speech welcome though.
  • Suitable as in “I’m not guaranteeing anything, use at your own risk.”
  • Commercial as in “closed source.”
  • Reports as in “letters, invoices, etc…”
  • and finally,  C# as in “.NET Client application (WinForm, WPF…)”

Recently I had the need to enhance an application (closed source) with basic reporting/printing capabilities. These reports are not business analytics reports. They are static documents with some dynamic part, like letters with dynamic heading, recipients, etc…
The final aim is paper printing.

Our constrainst are

  1. Client Program in C# (WinForm, WPF…)
  2. Free (as in free beer) – no money to buy components
  3. Can be used in closed source/commercial project
    Warning: I’m not a legal/licnence expert. I don’t guarantee my findings are correct.

So what did I find to enhance my app?

RDL Reports: ReportViewer Control and ReportBuilder Designer

Even if you just use Visual Studio (assume 2012) Express, and so miss integrated support for RDL reports and report projects in VS, the tools to program RDL reports are freely downloadable from MS.

With the above tools you’ll be able to author a report (RDL file) and render it with your app using the reportviewer control.

While not pertinent to my problem, it’s worth mentioning that you can get the full ReportingServices server (and another designer based on VS2010 shell) downloading SQL Server Express 2012 with advanced services.

Dynamic DOCX file generation in C# with Microsoft Open XML SDK Productivity Tool

With this tool you can use the DOCX DOM in C# to generate a document exactly as you need.

This approach is not for the faint of heart. The DOM is far from simple, and even trivial documents will take lots of code to be defined programmatically.

On the other side, you’ll be able to programmatically create your DOCX with the maximum freedom/flexibility (according to the DOCX standard of course).

Luckily MS provided one way to circumvent the complexity issue (and to avoid havng to learn all the DOM details).

  1. You can create a normal (static) DOCX document with any tool (Write, WordPad, Google Documents, Word…): template.docx
  2. Launch the Productivity Tool included in the OOXML SDK and open template.docx in the tool.
  3. The tool will parse template.docx and genrate the C# code that, will render test.docx programmatically with the DOCX DOM.
  4. Put your customizations/data bindings in the generated code and you’re done with your DOCX report.

There are obvious shortcomings (if you regenerate the C# code after modifiying the template you’ll have to re-do the data bindings) but it can be a quick free solution to genrate professional documents programmatically.

Programmatic Printing is the second big shortcoming. So far I would recommend (on Windows) to rely on WordPad command line (invoked from your client app).

write.exe /pt TextFileName PrinterName [ DriverName [ PortName ] ]

It’s a rough soulution, certanly not recommended by MS (or me), but it can be functional if printing from the application is only a not-very-used nice-to-have.

Generate dynamically PDF files with PDFSharp and MigraDoc libraries

While there is no “productivity tool” you can generate dynamically PDF files with MigraDoc “simple” object model.

PDFSharp can handle low level details (when needed) and you can rely on Adobe Reader (or similar) to print them from your application.

Tagged with: , ,
Posted in software-development

Measuring End To End Batch Performance in Production

In this article we are concerned about batch jobs: please, picture some kind of scheduled, repeated, server-side, db-intensive number crunching.

One example might be something like the following “Balance Job”.
Every day, many times a day, on your server is delivered a file containing Bank Account Operations of your customers. (Yes, today’s game… is The Bank Game!)
Operations can be either cash withdrowals or cash deposits.
Suppose the file is a classic Fixed Length record sequence: (Anybody said FileHelpers?)

OperationId Customer Account Date Operation Type Amount ($)
719252 AX00KH7 C01897 20130520 W 1200
719251 ER00996 C01005 20130520 D 4000
719250 ER00996 C01005 20130519 W 300
719249 ER00996 C01005 20130519 W 380
719248 ER00996 C01004 20130520 W 500
etc etc etc etc etc etc

You system holds the current balance for all these Accounts in a nice table like this.

Customer Account Date1 Date2 Balance ($)
AX00KH7 C01897 20121110 20121130 1500
AX00KH7 C01897 20121130 20130102 4000
AX00KH7 C01897 20130102 99991231 4200
AX00KH7 C01897 20130520 20130520 1200
etc etc etc etc etc

Not really relevant for this discussion, but since this example is a Valid-Time. You can read further here

The Balance Job will process the operations and update the Balance table accordingly.
We don’t care how the job is implemented: SQL, Java+SQL, SSIS, C#+NoDB or Prayers & Invocations. The implementation can well change over time.
While the Balance Job happily runs in Production, we’d like to answer questions like the following:

  • Is the job running slower than yesterday?
  • How much time will it likely take to process the next 10,000 operations?
  • Will it finish before I go home?
  • Is it faster to process one big file or many smaller files?

They boil down to knonwing how fast the job actually runs in prod.
We can give reasonable answers to these question only if we measure the performance of the job in its production environment.

But how do you measure performace?
First thing that comes to mind, when the job runs, you must log when it starts and when it ends. Then you can compute the duration of any given execution as the legth of time between start and end.
Suppose you run the job twice, the first time it takes 5 seconds and the next 2 minutes. What is to be gathered here? Does the job got slower the second time?
What if I told you that the first time it processed 32 records and the second 5,000?
The duration of a whole execution is not enough information. You need to define (and log) how much “Unitary Work” the job does. For the Balance job it makes sense to consider the processing of an “InputRow” as a “Unitary Work”.

Now your log looks like the following and it duly records how many times the “Unitary Work” has been performed for any execution.

ExecNo Start End InputRows
3522 2013-05-20 15:01:10.313 2013-05-20 15:05:07.105 5056
3521 2013-05-20 16:00:00.850 2013-05-20 16:00:30.050 32
etc etc etc etc

Other obvious choices for “Unitary Work” could be OutputRows (number of updated/created balance rows) or the sum of InpurRows and OutputRows (IORows?)
Different jobs will require different concepts of Unitary Work:

  • If your job imports some kind of entities in a db that is an aggregate root “Creating and persisting a new Aggregate Root (with its graph” can be a Unitary Work (OutputEntites)
  • When your job processes and combines many inputs to produce an output you could define some kind of Logical InputRow
  • your example here 🙂

For the Balance job, we’ll stick with InpurRows. Remember you’ll need to think about your situation.

With just these few elements we can define the “performance” of the Balance job as “the time to process an InputRow (InputRowTime)” i.e. the time to perform the unitary work. The smaller the better.

Equivalently you can define the performance as the “throughput” of the unitary work, 1/InputRowTime. This is nice because Higher values correspond to better performance, but I’ll stick with the former definition in this post.

If we look at our log, we can simply compute the InputRowTime for each execution as Duration/InputRows.

ExecNo Start End InputRows InputRowTime
3522 2013-05-20 15:01:10.313 2013-05-20 15:05:07.105 5056 0.04683
3521 2013-05-20 16:00:00.850 2013-05-20 16:00:30.050 32 0.9125
etc etc etc etc etc

In SQL: SELECT DATEDIFF(millisecond, StartDate, EndDate)/1000.0/InputRows
The second run of the job (ExecNo=3522, at the top of he table) is much faster than the previous, while the total execution time is longer.

  1. define the unitary work (InputRow)
  2. for each execution compute the (average) time to perform the unitary work (InputRowTime)

Given the above two points, every time you run the job in a given environment (on a given server), you can regard the execution as a measure of the job performance in the given environment.

It can be a good measure (close to the real value, which is unknown) or not. But the execution itself is the measurment and the “instrument” just the job and its host computer.

Dealing with measurements, we can apply to our Performance Measurments at least some of that Statistical Error Analysis we had to study taking Physics/Engineering courses.
This will be the topic of my next post 🙂

Now that we have dealt a little with the “measure part” let’s focus on the “Production Environment” part.
You need to assess the performance of your job in your “dev/test” environment, but those measurmenents are not the real thing.

  • Your dev db contains less data, or the data is distributed in some weird way?
  • There is less resource contention?
  • A different load on the server, at different times?
  • The dev build is different?
  • The runtime is a different version?
  • The hardware is almost certanly very defferent.
  • Does the job scale differently in dev?

You should try to build a realistic test environment but you cannot make it the same as prod. Some of the different factors will improve the job performance in test and some will run against it.
To sum up, some kind of performance test and stress test in a dedicated “dev/test” environment is necessary to try out different optimization choices and to understand the evolution of the project, yet as soon as you release in production you need to start collecting some “prod” performance data.
Production performance data can be quite different from your predictions and it definitely matters more.

I advocate that any but the simplest jobs have perf data collection built-in.

Posted in db, software-development

Test Driven Development and Interactive Development

TDD has many merits. Among the others it brought to the forefront the issue of programming in rapid feedback cycles against an expected result.

Write a test that fails, modify the code to pass the test, repeat.

Let me ignore the no-regression value of the test suite that you build up this way. The point of Test Driven Development is… [pardon me] Development! The point of TDD is the nice code you write due to the rapid feedback cycle of running the test suit.

Rapid feedback, hence the ability to develop/check results in small increments is one advantage enjoyed by dynamic/interpreted languages over static/compiled languages.  I mean the kind of interaction you have with a Lisp REPL, at the Ruby prompt or with Python or F#.
[Lisp and F# are not necessarily interpreted but the point is the interactive environment.]

I think one great merit of TDD is being the (best?) way to do dynamic/interactive programming in languages that otherwise would be… not very dynamic: Java, C#, C++…
If you write in Ruby, TDD is still very important, but I think that TDD in Java is much more important.
In “static” languages TDD promotes a useful style of interactive programming that otherwise would be difficult to achieve, while elsewhere interactive is the norm.

When someone complains about the difficulties of testing/doing TDD with T-SQL because it misses the familiar XUnit (actually it doesn’t, but that’s not the point) I need to be polemic.
T-SQL has lots of problems, but it really doesn’t lack interactive development.
Interactive is (can be) more than XUnit.
You can have T-SQL Unit if you want it, dear XUnit proponent, but you might be missing the point of TDD if you don’t realize that interactive development is already there.

Tagged with: ,
Posted in software-development

OOP is Overrated?

Yes it is. For application code at least, I’m pretty sure.
Not claiming any originality here, people smarter than me already noticed this fact ages ago.

Also, don’t misunderstand me, I’m not saying that OOP is bad. It probably is the best variant of procedural programming.
Maybe the term is OOP overused to describe anything that ends up in OO systems.
Things like VMs, garbage collection, type safety, mudules, generics or declarative queries (Linq) are a given, but they are not inherently object oriented.
I think these things (and others) are more relevant than the classic three principles.

Current advice is usually prefer composition over inheritance. I totally agree.

This is very, very important. Polymorphism cannot be ignored, but you don’t write lots of polymorphic methods in application code. You implement the occasional interface, but not every day.
Mostly you use them.
Because polymorphism is what you need to write reusable components, much less to use them.

Encapsulation is tricky. Again, if you ship reusable components, then method-level access modifiers make a lot of sense. But if you work on application code, such fine grained encapsulation can be overkill. You don’t want to struggle over the choice between internal and public for that fantastic method that will only ever be called once. Except in test code maybe. Hiding all implementation details in private members while retaining nice simple tests can be very difficult and not worth the troulbe. (InternalsVisibleTo being the least trouble, abstruse mock objects bigger trouble and Reflection-in-tests Armageddon).
Nice, simple unit tests are just more important than encapsulation for application code, so hello public!

So, my point is, if most programmers work on applications, and application code is not very OO, why do we always talk about inheritance at the job interview? 🙂

If you think about it, C# hasn’t been pure object oriented since the beginning (think delegates) and its evolution is a trajectory from OOP to… something else, something multiparadigm.

Posted in Uncategorized

Why good programming is more like writing than engineering

I think coding is more writing than engineering because the most important quality of good code is still understandability by humans. This is an aesthetic quality, it exists mostly as we perceive it and not objectively in itself.

Of course there exists a kind of statistic objectivity, that what is understandable to most is more so than what only few can grasp.
Yet even if most men think models are beautiful this does not make beauty objective. It still lies in the eye of the beholder.
Understandability is just like beauty  (for programmers).
I suspect that clear code is always beautiful in one way or another.
Understandability in code is more important than anything else  because, still to this day, it is the best way to achieve correctness (and maintainability, modifiabilty, etc…)

Maybe other fields of engineering and science don’t need to rely so much on easy understandability in order to achieve correctness.
Using the classic construction metaphore, someone must surely understand the project of a bridge to have it built correctly. But I suspect that the bridge bulding engineer can rely on a lot of well established maths in its project design that the software developer has no analgue of.
We have algorithms and type-systems… the whole field of computer science behind us. Yet CS is young compared to the centuries of study behind the physics of buildings.

Posted in software-development

Temporal databases: Valid-time tables

As the notion of time is fundamental to our perception of the world, it’s only normal that many real world databases are temporal databases: databases that deal explicitly with the temporal dimension of facts.

In the following location table, row 2 says that Silvio has been (or will be) in Rome “sometime”.

id person place
1 Silvio Milan
2 Silvio Rome

To record the fact that I’ve been in Rome from Jan 10th 2011 to Jan 12th 2011, I need more columns.

id person place valid_from valid_to
1 Silvio Milan 1900-01-01 2011-01-10
2 Silvio Rome 2011-01-10 2011-01-12
3 Silvio Milan 2011-01-12 9999-01-01

On row 2, the new columns valid_from, and valid_to store the period in which the relation (‘Silvio’, ‘Rome’) was true. Now “location” records the past and future history of where and when I’ve been. You see I don’t travel much.

The dates ‘1900-01-01’ and ‘9999-01-01’ stand for negative infinity and positive infinity. We conventionally use only dates greater than ‘1900-01-01’ and less than ‘9999-01-01’ to mean actual dates (in this db).

“Negative infinity” doesn’t have to be the beginning of time. It’s as back in time as the system cares. (The same is true for “positive infinity” of course.)
I could have chosen other dates, it depends on what you are modeling.

I don’t use NULL to represent infinite values because they would complicate the queries on “location”.

Compare the following two versions of “where was Silvio on October 20th 2011?” and keep in mind that it’ll only get worse with “real” queries (… and don’t use * in your “real” production queries).

-- 1. no nulls
select * from location
person = 'Silvio' and
valid_from <= '2011-10-20' and '2011-10-20' < valid_to

-- 2. with nulls
select * from location
person = 'Silvio' and
(valid_from <= '2011-10-20' or valid_from is null) and
('2011-10-20' < valid_to or valid_to is null)

Also, since I haven’t used NULL to represent infinity, if I want, I can still use NULL for what it’s meant: “information unknown”. (but I wouldn’t).

You have noticed that the value of “valid_to” of one row is always the same as the value of “valid_from” of the next row.
Resist the urge to “normalize” the schema, e. g. removing the “valid_to” column. If you do remove it, say goodbye to nice, simple queries: hello sql inferno!

To summarize the temporal design features of “location” table, we can say that “location” is a valid-time table.

Now, you’ll think that this modeling of valid-time tables is really trivial. I agree about the simplicity. Simple designs are the holy grail of programming.
If you ever maintained an app with a temporal db, you may have observed that surprisingly simple things like valid-time table end up messily implemented.
Also there are some non-trivial implications about introducing the time dimension in a table: think about the identity of your entities and you unique keys. Think about enforcing integrity constraints on a valid-time table… Lots of fun here 🙂

We haven’t even scratched the surface of temporal databases yet. I am convinced that this topic is absolutely crucial for any enterprise (that I can think of) but still doesn’t usually get the attention it deserves.

I end this introductory post with a couple of references.
A few months ago I found an excellent book on the subject of temporal-db.

Developing Time-Oriented Database Applications in SQL
Richard T. Snodgrass

Morgan Kaufmann Publishers, Inc.
San Francisco, July, 1999
504+xxiii pages
ISBN 1-55860-436-7

You can download it from the author’s page tdbbook.pdf
I can only thank the author for his generosity.
I’m currently reading this book, so, if you find anything valuable in my posts about temporal-dbs, you probably should just thank Mr. Snodgrass. Obviously any errors are mine.

While I enjoy writing down my thoughts, I only hope to provoke your curiosity. On any topic, if you’re serious about it, I invite you to take a look at the sources and do some research of your own. (research as in googling, not as in university).

Wikipedia is also a great resource (as ever):

Tagged with:
Posted in books, db, software-development, temporal-db