Information Publication for Systems Engineers – making engineering outputs more accessible
02/11/18
Setting the scene
What don’t we cover?
- Giving presentations
- The fine details of the ‘academic’ publication process
- How to do techincal writing
- Being an artist
- Data analysis
What do we cover?
Contents.
- General concepts
- Data/Information management
- Document presentation
- Graphic presentation
- Publication
- Advanced techniques
Why do we care…
…as engineers?
…as Systems Engineers?
Pedantry, or professionalism?
Who am I?
Not a data visualisation expert or a typographer, just a practitioner and an enthusiast.
Dr John Welford MEng PhD CEng IET MINCOSE
WSP New Zealand, Technical Principal Systems Engineer
General concepts
Data and information
Think about your audience
- What do they already know? What will be familiar to them?
- What do they need to know?
- What is the inferential distance between them and you?
- What special needs might they have?
Presentation of abstraction
As engineers we are often working with an abstraction of the real system.
When we publish information we are always presenting an abstraction of the real system.
It seems that perfection is attained not when there is nothing more to add, but when there is nothing more to remove.
It is therefore up to us to choose what to emphasize, for example…
Separating content from presentation
Familiar to web-authors: HTML = content (+ structure),
CSS = presentation.
In DIKW terms: information = content (+ structure),
publication = presentation.
Ideally — first develop the content, then later develop the presentation.
Practically — development is often in parallel; however, content should always be prioritised.
Some tools provide a clear separation between content and presentation, others are more WYSIWYG. In either case, it pays to at least conceptually make the separation.
Dangers of conflating content with presentation
This content is written in Markdown, the formatting uses tufte-css styling.
- Distraction from the process of working with information
- Reduced portability and compactness
- Lack of proper information structure
Additional considerations
Be conscious & intentional: Every aspect of presentation represents a decision. Every decision should be justifiable.
Be consistent: The same decision should have the same outcome each time it’s made. There should be uniformity in the resulting publication.
Beauty vs. practicality: Ideally both! But (for engineering): \(practicality >> beauty\).
Science, art, opinion
Be aware that most advice on the topic of information publication falls into one of three categories (including this tutorial):
- Science (researched and peer-reviewed)
- Art (general expert concensus)
- Opinion (my own)
References
Other authors to read or follow: Edward Tufte, Naomi Robbins, Mike Bostock, Bret Victor.
References cited or linked throughout, but recommended reference texts are:
The Elements of Typographic Style — Robert Bringhurst
Visualization Analysis & Design — Tamara Munzer
Show Me the Numbers: Designing Tables and Graphs to Enlighten — Stephen Few
Data/information management
Data structure
Likely to be out of your control:
- Results of a questionnaire
- Export from a simulation, model, or design tool
- Logs from monitoring equipment
- Scraping websites
- Verbal or written input from experts
- Previous work and general literature
- Experience and engineering judgement
The process of producing information from the data will involve some degree of analysis, but it is also where sensible choices can be made about the information structure.
Variations in data structure
Data can have many different structures. Part of data analysis is cleaning and tidying the data.
Consider…
Treatment A | Treatment B | |
---|---|---|
John Smith | — | 2 |
Jane Doe | 16 | 11 |
Mary Johnson | 3 | 1 |
Transposed as…
John Smith | Jane Doe | Mary Johnson | |
---|---|---|---|
Treatment A | — | 16 | 3 |
Treatment B | 2 | 11 | 1 |
Or tidied as…
Name | Treatment | Result |
---|---|---|
John Smith | A | — |
Jane Doe | A | 16 |
Mary Johnson | A | 13 |
John Smith | B | 2 |
Jane Doe | B | 11 |
Mary Johnson | B | 1 |
Dont be afraid to change your data structure to support analysis and information presentation.
Information structure – focus on content
The output of analysis should yield information. At this stage it can be very tempting to start presenting the information, indeed you may wish to start considering the final published form prior to the information being complete.
However, it requires discipline to keep the concept of structuring the content separate from presenting it.
For example, setting up the structure of your report, is separate from deciding which levels of heading are going to be in bold.
Configuration control
Both data and information should be under some form of configuration management, ideally supporting:
- Versioning
- Change control
- Baselines
- Access control
- Backup/recovery
Also consider maintaining an auditable trail of how the information was generated from the data.
Data/Information tools
Spreadsheets: Excel, Calc, Numbers, Google Sheets
Databases: Access, DOORS, SQL, MBSE tools
Formats: CSV, JSON, XML
Languages: Matlab, Python, R, LaTeX, Markdown
Document presentation
The definition of ‘best’ here will depend on your audience and what the information is – as discussed previously.
Assume the information we wish to publish is best presented in the form of a document.
We’ll also assume that the information is already developed in terms of both content and structure. The format that this is captured in might vary depending on the tool that we choose to use.
For reference let’s have a look at some examples of content and structure…
Content and structure – LibreOffice Writer
NB: Structure is less explicit here, as it is a WYSIWYG tool.
Content and structure – Markdown
The same document in Markdown.
Example document
================
Demonstrating the information *content* for a document, and a bunch of different aspects of document *structure* (the funny-looking parts).
## Lists
Could be:
* Enumerated
* Unordered
* Sub-lists
## Links
To websites such as [Google](www.google.com), or to other sections of the document such as the [list](#lists) section.
## Tables
| Heading | Attributes |
| ------- | ---------- |
| Content | Can include text or numbers |
| Rows | 3.142 |
## Images
![WSP local logo](graphics/wsp.png)
![SESA web logo](https://www.sesa.org.au/templates/js_simplepro_red/images/sesa_logo.png)
Content and structure – HTML
The same document in HTML.
<h1 id="example-document">Example document</h1>
<p>Demonstrating the information <em>content</em> for a document, and a bunch of different aspects of document <em>structure</em> (the funny-looking parts).</p>
<h2 id="lists">Lists</h2>
<p>Could be:</p>
<ul>
<li>Enumerated</li>
<li>Unordered
<ul>
<li>Sub-lists</li>
</ul>
</li>
</ul>
<h2 id="links">Links</h2>
<p>To websites such as <a href="www.google.com">Google</a>, or to other sections of the document such as the <a href="#lists">lists</a> section.</p>
<h2 id="tables">Tables</h2>
<table>
<thead>
<tr>
<th>Heading</th>
<th>Attributes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Content</td>
<td>Can include text or numbers</td>
</tr>
<tr>
<td>Rows</td>
<td>3.142</td>
</tr>
</tbody>
</table>
<h2 id="images">Images</h2>
<p><img src="graphics/wsp.png" alt="WSP local logo">
<img src="https://www.sesa.org.au/templates/js_simplepro_red/images/sesa_logo.png" alt="SESA web logo"></p>
Content and structure – LaTeX
The same document in LaTeX.
\section{Example document}\label{example-document}
Demonstrating the information \emph{content} for a document, and a bunch of different aspects of document \emph{structure} (the funny-looking parts).
\subsection{Lists}\label{lists}
Could be:
\begin{itemize}
\item Enumerated
\item Unordered
\begin{itemize}
\item Sub-lists
\end{itemize}
\end{itemize}
\subsection{Links}\label{links}
To websites such as \href{www.google.com}{Google}, or to other sections of the document such as the \hyperlink{lists}{lists} section.
\subsection{Tables}\label{tables}
\begin{longtable}[]{@{}ll@{}}
\toprule
Heading & Attributes\tabularnewline
\midrule
\endhead
Content & Can include text or numbers\tabularnewline
Rows & 3.142\tabularnewline
\bottomrule
\end{longtable}
\subsection{Images}\label{images}
\includegraphics{graphics/wsp.png}
\includegraphics{https://www.sesa.org.au/templates/js_simplepro_red/images/sesa_logo.png}
Presenting the document
In all the preceeding examples the content was the same, although the syntax used to provide the structure was different. However, presentation of that content depends on the styling that is added based on the structure.
Styling will also have it’s own syntax that is tool/language specific. The capability of different tools to provide styling also varies significantly.
Next we will run through a bunch of different document presentation areas where we should make concious decisions about styling, this is better known as typography.
Typeface
This document is predominantly set in the open source ETBook font, similar to the proprietary Monotype Bembo font.
A general ‘font family’, typefaces include many different fonts with variations in size and emphasis (bold, italic, etc).
Typefaces (and fonts) may be classified as either sans-serif, or serif.
Serifed fonts are considered easier to read in print than sans-serif. However, the science on this appears to be inconclusive.
Conversely sans-serif are sometimes preferred for on screen reading, as they scale better at low resolutions.
Some typefaces have been designed to assist dyslexic readers, however their efficacy is disputed.
Sans Forgetica is designed to be intentionally difficult to read, as this prompts your brain to engage in deeper processing.
Headings
Typically a document may have many levels of heading.
Too many levels and you will lose the reader!
ALL CAPS, ‘Title Case’, and ‘Sentence case’ can be used at different levels of heading, along with changes in size, colour and font.
Text size and spacing
Size: choose for legibility and to match the page (see later).
Letter spacing: best to stick with the default!
Setting: Either flush-left ragged-right, or justified. Justified text is achieved by the tool modifying the inter-word spacing, so choose justifed text only when the line is long enough — hyphenation may still be necessary to avoid sloppy spacing (if your tool supports this).
Advanced tools may support microtypography, which subtly adjusts other aspects of the text to improve readability and appearance.
Justified text is considered to be a problem for dyslexics, due to the uneven spacing and distracting ‘rivers’ of white space.
Kerning:
Spaces between sentences
Double spacing is an artefact of victorian typewriter usage and is no longer relevant.
Paragraphs
Provide a pause in reading, and may be shown by either an initial indent or a slight space between blocks of text.
Indents are more common in printed literature, whilst spacing is more prevalent on the web (where there are less space constraints).
If you’re working in a WYSIWYG tool then you shouldn’t be inserting an extra carriage return between paragraphs. This is mixing up content with presentation.
Weighting and emphasis
Change one parameter at a time.
Page layout
Line length: 66 characters is considered ideal, but anything 45 to 75 characters is ok (including punctuation and spaces). Longer might be ok for discontinuous texts (e.g. bibliographies).
Line spacing: leading is usually slightly more than character height, giving a small gap between lines. Sometimes much larger spacing (1.5 or double-space) may be requested to allow for handwritten review comments.
Margins: textblock width should be defined to achieve the right line length based on the typeface size and page size. Textblock height depends only on what size margins you leave — don’t be stingy on the margins or your page will look ugly! Also worth considering are binding and on-screen reading.
Headers and footers: may carry information about the section and the page, or about the publication itself. The latter only seems necessary if there is a danger that pages may be reproduced out-of-context.
Lists
Avoid over-punctuating lists.
Be consistent in list structure and punctuation.
Enumerate lists when items have an order, or where they need to be referenced later (although be aware that this may imply a priority).
Special characters
NB: These are content not presentation!
Use non-breaking spaces where words should not be separated.
Dashes – come in various lengths, choose the correct one for your purpose:
- The hyphen - is used to join words.
- The en dash – is used in ranges.
- En dashes or em dashes — are used in a similar way to brackets.
- The minus sign \(-\) is a separate character altogether.
Notation, quantities and units
Try to stick with standard choices for symbols representing variables (but don’t forget to also define them!).
Consider using a different typeface or italic font for variables.
Use SI units as far as possible.
Take care when typesetting numbers and units, some tips:
- Insert a small non-breaking space between numbers and units
- Ensure you have the SI prefix on your units correct (e.g. \(10\) \(Kg\) is not ten kilograms, it is ten Kelvin grams!)
- Consider raising units to a negative power in place of the divide symbol
Units in tables and figures
In tables and figures consider using a slash to denote the units. This is recommended by the BIPM as the correct method of expressing values for multiple quantities.
Equations
Equations are usually set centred, with a reference number on the right-hand side of the page.
(3.32)
\(\theta_b = \max\bigl(\widehat{\theta}_b,\min(-\widehat{\theta}_b,\int \omega_b \text{ d} t)\bigr)\)
Use a proper equation editor!
Abbreviations (acronyms & initialisms)
Very rarely do readers wish that an author had used more abbreviations!
Define abbreviations both the first time they are used, and within an abbreviations list.
Notes and references
Notes: use end-notes or side-notes for digressions that do not belong in the main text. References are a subset of notes.
References: Use proper reference management software; autogenerate bibliographies.
Consider hyperlinks if your document will be presented in digital format.
Don’t cite fake references!
Cross-references
Where appropriate, cross-references within the document should be made, including:
- Chapters and sections
- Figures
- Tables
- Equations
- Pages
- Above/below
Always use your tool’s cross-referencing functionality for these.
Figures
See the Graphic Presentation section for more.
Figures should appear as soon as possible after, but not before, they are referenced in the text.
Text in figures should be horizontal (or at least oblique).
Text should be legible.
All figures should have captions below them.
Tables
Tables should appear as soon as possible after, but not before, they are referenced in the text.
Numbers in columns should be aligned on the decimal.
Heavy gridlines are not necessary; a few horizontal rules are ok, but white space is usually better.
Quoted numeric precision should reflect accuracy of measurement.
A tables should have captions above them.
Making decisions on style
Start with prescribed document templates.
Check whether your organisation has a house style or style guide.
Choose another organisation’s manual of style:
Document tools
Word processing: Word, Writer, LyX, Pages, Google Docs
Typesetting: InDesign, Scribus, Publisher, LaTeX
Content editing (any text editor): Notepad, Notepad++, Emacs, vim
For converting content and structure between tools try Pandoc.
Graphic presentation
Is a diagram worth a thousand words?
But sometimes yes!
Always consider text before tables, and tables before graphics.
Anscombes Quartet
For another fun example check out the Anscombosaurus!
x | y |
---|---|
10.00 | 8.04 |
8.00 | 6.95 |
13.00 | 7.58 |
9.00 | 8.81 |
11.00 | 8.33 |
14.00 | 9.96 |
6.00 | 7.24 |
4.00 | 4.26 |
12.00 | 10.84 |
7.00 | 4.82 |
5.00 | 5.68 |
x | y |
---|---|
10.00 | 9.14 |
8.00 | 8.14 |
13.00 | 8.74 |
9.00 | 8.77 |
11.00 | 9.26 |
14.00 | 8.10 |
6.00 | 6.13 |
4.00 | 3.10 |
12.00 | 9.13 |
7.00 | 7.26 |
5.00 | 4.74 |
x | y |
---|---|
10.00 | 7.46 |
8.00 | 6.77 |
13.00 | 12.74 |
9.00 | 7.11 |
11.00 | 7.81 |
14.00 | 8.84 |
6.00 | 6.08 |
4.00 | 5.39 |
12.00 | 8.15 |
7.00 | 6.42 |
5.00 | 5.73 |
x | y |
---|---|
8.00 | 6.58 |
8.00 | 5.76 |
8.00 | 7.71 |
8.00 | 8.84 |
8.00 | 8.47 |
8.00 | 7.04 |
8.00 | 5.25 |
19.00 | 12.50 |
8.00 | 5.56 |
8.00 | 7.91 |
8.00 | 6.89 |
All sets have the same:
- Mean (\(\bar{x}=9.00\), \(\bar{y}=7.50\))
- Variance (\(\sigma^2_x=10.00\), \(\sigma^2_y=3.75\))
- Correlation (\(\rho_{x,y}=0.816\))
Two motivations for visualisation
Discovery:
- To explore and understand a dataset or problem space
- As a step towards automation (debugging)
- To aid human-in-the-loop processes
- Working with data
Presentation:
- For explanation
- Presenting information Here we are concerned only with the latter, however most of the advice is equally applicable to the former.
The graphic may need to display the data, but the message should be the information.
Dataset types
Attributes
Encoding channels
Why are some channels more effective?
Keys and values
Keys: Independent attributes (categorical or ordinal)
Values: Dependent attributes (categorical, ordinal or quantitative)
Zero keys: Scatterplot
One key: Bar chart, Line chart, Dot charts, Coloured scatterplot
Two keys: Heatmap, Stacked bar chart, Coloured bar/line/dot charts
Three or more keys: ‘Small multiples’ of the above
Small multiples
Categorical keys
Choose a sensible order for the key.
Don’t connect points between categorical data.
Networks and trees
Both are highly relevant to Systems Engineering.
Networks:
- Visual modelling languages (SysML, LML, OPM, Simulink)
- Interfaces
- Stakeholder relationships
Trees:
- Work Breakdown Structures
- System Breakdown Structures
- Filesystems
Node–Link diagrams (networks and trees)
Intuitive, but limited in network size (consider interactivity or separate views).
Link density = number of links per node. Trees have link density of one. Maximum link density for effectiveness is around three or four.
Consider layout:
- Automatic (e.g. force-directed) or manual
- Reading direction
- Information density (see later)
- Minimise edge crossings
- Minimise ratio between longest and shortest edges
Matrix views (networks and trees)
In graphs of more than 20 vertices, matrix views typically perform better; the exception being where path finding is important.
Enclosure (trees)
Show hierarchical structure through containment rather than connection.
Note: these are not appropriate for networks.
Spatial data
Where data is spatial, it is usually beneficial to present it spatially.
Geometry: Chloropeth maps
Fields: Isocontours, Vector fields
Colour
Categorical: There are a limited number of discriminable bins (max 6–12), ensure that there is clear separation between them. Use variations in colour hue.
Ordered: Colour scales should be perceptually linear. Use variations in colour luminance, saturation and/or hue.
Rainbow colourmaps (e.g. ‘Jet’) are a poor default, as they are perceptually unordered and nonlinear. Try Viridis or Cividis.
Consider colour-blind users (Cividis is better).
Use online tools such as colorbrewer to choose a palette.
Or, taken from a tweet:
Information density
Also called data-ink ratio.
Above all else show the data.
Amount of information encoded, vs empty space in the graphic.
Generally higher information density is preferred.
- Subtract unnecessary ink (both data and non-data).
- De-emphasize remaining non-data ink. Emphasize remaining data ink.
Labelling graphics
Always label graphics! (items and attributes)
Ensure labels are legible (axis, font, size). Avoid abbreviations.
Use horizontal labels wherever possible. Rotating the graphic can make this easier. Oblique labels are a compromise.
Refer back to Stevens Power Law for a good example.
Make sure all encoding is labelled. Legends are ok, directly labelling the data is better.
Producing graphics
- For each item, choose the attribute(s) you wish to display
- For each attribute to display, choose an appropriate channel to encode it
- Choose a layout and labelling structure
- If appropriate, choose how your item keys are sorted
- Review and iterate
Graphics tools
Drawing: Visio, Inkscape
Data drawing: Tableau, Excel, Google charts
Data linking: D3 (javascript), Shiny (R), Matlibplot (Python), Visio
Publication
Remember the basics!
- Spell check
- Check links and references
- Set document properties
- Remove comments and version history
- Check formatting (in final publication format)
Work in a suitable format
Raster drawing tools manipulate pixels in an image. Except in rare cases, they do not embed the data.
Vector drawing tools manipulate shapes in a coordinate system. They allow data to be directly represented and embedded.
Benefits of vector formats:
- Vector to raster is easy, raster to vector is difficult and messy
- Vector files are typically smaller
- Vector files are easier to edit
- Vector images scale infinitely
Deliver in a suitable format
Word and Excel are editing tools, with reading modes bolted on.
Due to proprietary formats they also require the recipient to own a copy of the tool.
PDF is a better alternative. Delivering in PDF will:
- Reduce ‘accidental’ editing
- Reduce plagiarism
- Be more likely to reproduce correctly
- Be supported by many free viewers
HTML is emerging as an even better standard.
Image formats
Different image formats have different uses (except bitmap, which has been superseded).
Advanced techniques
Interactive graphics
Interactivity is useful to handle complexity, it:
- Allows extra data to be shown without cluttering the graphic
- Allows the user to query the data
- Provides a lower level of detail
- Allows connection between views
- Engages and draws-in the reader
Avoid unnecessary interactivity, and do not rely on it.
Multiple views
A presentation of the data is only one possible view of it. Different views may be useful for:
- Different stakeholders
- Presenting different information (based on the same data)
- Showing different levels of abstraction to develop understanding
Levels of abstraction
Present to reflect Systems Thinking – take the reader through Bret Victor’s Ladder of abstraction.
Transitioning graphics
Animated transitions between views of the same data can be used to help aid understanding.
Research has shown that animated transitions between related data graphics significantly improve visual perception.
These benefits are greater if the animations are staged.
Explorable explanations
Explorable explanations are reactive and explorable documents that allow a reader to:
- Understand the authors underlying models
- Play with model assumptions
- Provide contextual information to learn related material
- Cross check the authors claims
There are a large number of examples available.
Most are built in HTML, however the Computable Document Format does exist as well.
There are lots of examples of explorable explanations and interactive visulations on observable.
A recent development is the Observable coding system, which provides a reactive programming environment to support explorable explanations, visualisations and active reading.
Speed reading
There are a few technologies around such as Spritz to facilitate speed reading.
E-books
E-books are arguably the future of reading:
- ‘Paperless office’
- Searchable
- Distributable
- Compact
- Resizable (reflowable text)
There are a variety of formats, readers and creation tools available, formats can be converted easily.
Homework exercise
Improve upon one of these publications.
- What do you think the overall message is? (information)
- What is the background data? (items and attributes)
- What type are the attributes? (ordinal/quantitative/categorical) (keys/values)
- What are the channel encodings that have been used?
- What different (/better) encodings could be used?
- Sketch out a new version and label it.