讲解data structure编程、Processing讲解、辅导 Dioxin emission程序

Dioxin emission
This is the rst of three mandatory assignments for the course 02424. Sub-
mission must contain only one collected le in portable document format
(pdf); other formats will not be accepted.
Background
Dioxin is a shorthand name given to a family of 75 di erent chemical com-
pounds with a similar chemical structure and a set of biological e ects
though their potency varies widely. Dioxins are organochlorines or com-
pounds formed when a chlorine molecule binds with a carbon molecule dur-
ing combustion or in some type of industrial production process. Dioxins
cause cancer in humans and animals. Animal tests also show interference
with reproduction, development and immune system function at low doses.
This raises substantial concern about current human exposure levels.
The toxicity of a dioxin compound will vary depending on the number
and position within the molecule of the chlorine atoms. One of the most
well known (and most toxic) forms of dioxin is 2,3,7,8-tetrachlorodibenzo-p-
dioxin, or TCDD28 (the ’Seveso’ dioxin).
Municipal solid waste (MSW) and medical waste incinerators are the most
signi cant sources of dioxin in the environment collectively accounting for
roughly 85-87 % of all known dioxin emission.
The purpose of this project
The purpose of this assignment is to nd a model for the variation of mea-
sured dioxin emission at a number of Danish municipal solid waste inciner-
ator plants. It is of particular interest to investigate whether the operating
conditions in uence the dioxin emission.
In order to create the data needed for setting up such a model, a large
number of experiments have been conducted at a number of Danish MSW
incinerator plants. During these experiments gas samples have been col-
lected and the dioxin emission estimated in the samples. Likewise a large
number of possible explanatory variables have been measured.
The experiments
Care must be taken in planning the experiments to ensure that useful models
and reliable conclusions can be formulated. Furthermore, dioxin measure-
ments are di cult to obtain and the analyses are expensive. In order to
ensure reliable conclusions and to obtain maximum information in the data,
statistical designed experiments have been used.
The experiments have been conducted at three Danish MSW plants. For
one of the plants the experiment was repeated at a later time. The layout
of an experiment is shown in Figure 1.
High
High
Low
Low
Low HighNormal
Prim/second
Oxygen surplus
Load
Air
Figure 1: Experimental setup corresponding to a single visit at a MSW
plant.
Dependent variable
The dependent variable is the concentration of dioxin in the combustion air.
More precisely the concentration is measured as the total amount of dioxin
in ng pr. m3. In the data le the dependent variable is called DIOX. You
should consider transforming the dependent variable in the analysis.
Explanatory variables
The total set of explanatory variables is most conveniently divided into the
so-called block e ects, active and passive variables.
2
Block e ects
The experiments are conducted under some conditions which we need to
take into account in the modelling. These conditions are often termed block
e ects. In the list of block e ects the name of the e ects are shown in
brackets. The block e ects are:
a136 MSW Plant (PLANT). The experiments have been conducted at three
Danish MSW plans named RENO N, KARA, and RENO S.
a136 Time (TIME). For the plant RENO N, the experiment was repeated at
a later time point (1,2).
a136 Laboratory (LAB). Two laboratories have been used for the analysis,
one in Denmark (KK) and one in US (USA). It is very di cult (and
expensive) to measure the amount of dioxin, hence the data is assumed
to be encumbered with considerable measurement noise.
Active and passive variables
The explanatory variables are considered as being either active or passive.
The active variables are those varied according to the experimental plan,
while the passive variables are all other measured variables which might
in uence the dependent variable.
Active variables:
a136 Oxygen surplus in gas (OXYGEN)
a136 Plant load (LOAD)
a136 Air distribution (primary/secondary) (PRSEK)
In order to obtain the optimal amount of information each measurement
corresponds to one of the corners in a cubus, as shown in Figure 1. The order
of experiments within the cubus is randomised. Even though the experiment
is planned as described above it is often rather di cult in practice to obtain
the desired values of the active variables, Therefore is is more reasonable to
use the actual measured values related to the design. Those values are:
a136 Measured oxygen surplus is called (O2). In the data le a value of the
oxygen corrected by the mean is given by (O2COR).
3
a136 The normalized measured plant load is called (NEFFEKT). It is de ned
as the di erence between the actual e ect and the mean e ect divided
by the mean e ect. A value of NEFFEKT = 0:15 means that the load
is 15 % less than the design load.
a136 The ratio between the primary air and the secondary air is called
(QRAT).
Passive variables:
a136 Gas ow (QROEG) (m3=h)
a136 Combustion chamber temperature (TOVN) (oC)
a136 Gas temperature (TROEG) (oC)
a136 Pressure in the chamber (POVN)
a136 CO2 (CO2) (ppm)
a136 CO (CO) (ppm)
a136 SO2 (SO2) (mg=m3)
a136 HCl (HCL) (mg=m3)
a136 H2O (H2O) (%)
CO2, CO, SO2, HCl and H2O are measured in the gas.
The observations are numbered by a unique number as represented by OB-
SERV.
How to get the data?
The data can be found in File sharing on CampusNet in a le called dioxin.csv.
The rst row of the le contains the name of the variables referred to in the
text.
The data can be imported into R with the following command:
dat <- read.table("dioxin.csv", sep=a39,a39, head=TRUE)
4
Goal of the project
The goal of this project to nd a model for the variations observed in the
reported dioxin emission. For the analysis use a = 5% It is important that
you
1. Read the introduction carefully
2. Specify the models and the underlying assumptions
3. Explain how you reduce the models
4. Check the underlying assumptions (residuals)
5. Explain your results (remember to take all transformations you may
have done into account)
Particularly we want you to answer the following questions/do the following
analysis:
1. Start with some preliminary/explorative analysis of the data by mak-
ing some plots.
2. Set up a simple additive model only with the active and the block
variables. Reduce the model if possible.
3. Set up a similar model as before but now use the measured values of
the active variables along with the block variables. Reduce the model
if possible.
4. Using the model with the measured active variables, predict the dioxin
emission in the rst visit to the RENO N plant, analysed in the KK
laboratory with O2COR = 0.5, NEFFEKT = -0.01 and QRAT = 0.5
(you should disregard values of variables that you have removed from
the model). Give a 95 % prediction interval as well.
5. Does the dioxin emission depend on the operating conditions (O2COR,
NEFFEKT or QRAT)?
If ’yes’, how can the dioxin emission be reduced by changing these
conditions.
6. Do you see any di erences between the considered MSW plants? What
about the two laboratories?
7. Set up a nal model, this time with the passive variables as well. You
may want to consider including higher order terms in the model to nd
the model that gives the most complete description of the variation in
5
the dioxin emission. Use residual analysis to validate the model and
check if some observations are particularly in uential. Give estimates
of the parameters in the model and their uncertainties.
8. Make a brief abstract for your \grandmother", i.e. summarize your
nal model in plain words (no more than 250 words).
9. (Possibly di cult) It has been shown that the precision in the KK
laboratory is better than in the USA laboratory. Having this in mind,
re-estimate your nal model, using likelihood estimates of the weight
between the measurements in the two labs. Also consider the uncer-
tainty of the estimated weight. 1
1A better estimate for the weights is the REML estimate (which we will cover later on
in the course), and you may compare with the gls function in the nlme package using the
weights argument.