6.benchmarking of
Upcoming SlideShare
Loading in...5

6.benchmarking of






Total Views
Slideshare-icon Views on SlideShare
Embed Views



1 Embed 1

http://hsb.edu.vn 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    6.benchmarking of 6.benchmarking of Document Transcript

    • Article Title PageBenchmarking of Marine Bunker Fuel Suppliers: The Good, The Bad, The UglyAuthor DetailsAuthor 1 Name: Ole Jørgen AnfindsenUniversity/Institution: DNV Research & InnovationTown/City: HøvikCountry: NorwayAuthor 2 Name: Grunde LøvollUniversity/Institution: DNV Research & InnovationTown/City: HøvikCountry: NorwayAuthor 3 Name: Thomas MestlUniversity/Institution: DNV Research & InnovationTown/City: HøvikCountry: NorwayCorresponding author: Ole Jørgen AnfindsenCorresponding Author’s Email: ole.jorgen.anfindsen@dnv.comAcknowledgments (if applicable): n/aBiographical Details (if applicable): Ole Anfindsen holds a dr. scient. degree (PhD) in computer science and a bachelors degreein electronics engineering. For more than 25 years he has worked with databases and related technologies. He has been seniorresearch scientist in Telenor R&D, visiting researcher at GTE Laboratories (Massachusetts) and Sun Microsystems Laboratories(California), as well as adjunct associate professor at the Institute of Informatics at the University of Oslo. He currently works as aresearcher in the Research & Innovation department of DNV, where his main activity is directed towards data analysis especially inthe maritime area.G. Løvoll has a dr. scient. degree (PhD) in physics. Grunde has worked for 6 years as a Post Doc and researcher at theDepartment of Physics at the University of Oslo doing experimental studies on multiphase flow in porous materials, water diffusion indry clay and optical tweezers. Dr. Løvoll currently works as a researcher in DNV Research & Innovation, where his main focus is ondata analysis in the maritime area.Thomas Mestl has a Dr. Scient. (PhD) in mathematics and a degree in precisions engineering. He has worked in DNVs ResearchDepartment for the last 13 years within the field of information technology. A large part of his work has been on identifying emergingtechnology trends, evaluating new ICT technologies (especially with respect to mobile work and information management), and toidentify promising business opportunities offered by new or combination of existing technologies. Currently, his main activity isdirected towards data analysis especially in the maritime area.Structured Abstract: Purpose - This paper has two main focus areas; the construction of a realistic best practice benchmark, andthe development of a methodology for comparison of individual suppliers of marine bunker fuel. As is well-known in this trade, unfairbusiness behaviors in the bunker fuel market are not uncommon, resulting in financial losses for the buyers.Design/methodology/approach - Establishing a best practice will naturally involve some degree of subjectivity as there is not apriori correct answer to this problem. Using the concept of membership functions from fuzzy set theory, a score can be derived froma best practice benchmark histogram. The main advantages of this method are its relative independence both of sample size and ofthe underlying distribution, as well as being computationally very efficient.Findings - Our methodology turns out to be more powerful than standard descriptive statistics, as it is less sensitive to outliers andis well suited for small datasets and even single numbers. When applied to data for all suppliers worldwide it turns out that thenumber of good suppliers is actually much lower than might be expected.Practical implications - Bunker fuel is a major expense for ship owners, and can easily reach $30 million/yearfor a single container ship. There is therefore a considerable interest in the market for benchmarking of individualfuel suppliers. Our methodology is also applicable to other quality related fuel parameters.Originality/value - To the best of our knowledge this is the first attempt to benchmark actors in the marine bunkerfuel industry and to quantify their behaviors.Keywords: benchmarking, membership functions, scoring, fuzzy clustering, supplier quality, best practice
    • Type header information hereArticle Classification: Technical paperFor internal production use onlyRunning Heads: Type footer information here
    • Benchmarking of Marine Bunker Fuel Suppliers: The Good, The Bad, The UglyAbstractPurposeThis paper has two main focus areas; the construction of a realistic best practice benchmark, and thedevelopment of a methodology for comparison of individual suppliers of marine bunker fuel. As is well-knownin this trade, unfair business behaviors in the bunker fuel market are not uncommon, resulting in financial lossesfor the buyers.Design/methodology/approachEstablishing a best practice will naturally involve some degree of subjectivity as there is no a priori correctanswer to this problem. Using the concept of membership functions from fuzzy set theory, a score can be derivedfrom a best practice benchmark histogram. The main advantages of this method are it’s relative independenceboth of sample size and of the underlying distribution, as well as being computationally very efficient.FindingsOur methodology turns out to be more powerful than standard descriptive statistics, as it is less sensitive tooutliers and is well suited for small datasets and even single numbers. When applied to data for all suppliersworldwide it turns out that the number of good suppliers is actually much lower than what might be expected.Practical implicationsBunker fuel is a major expense for ship owners, and can easily reach $30 million/year for a single container ship.There is therefore a considerable interest in the market for benchmarking of individual fuel suppliers. Ourmethodology is also applicable to other quality related fuel parameters.Originality/valueTo the best of our knowledge this is the first attempt to benchmark actors in the marine bunker fuel industry andto quantify their behaviors.Keywords: benchmarking, membership functions, scoring, fuzzy clustering, supplier quality,best practiceCategory: Technical Paper 1. IntroductionThe density of marine bunker fuel can be regarded as one of its most basic parameters. It is used forfuel quantity estimation, and is also the basis for the so-called Calculated Carbon Aromaticity Index(CCAI), an important factor for ignition and for deposits in the engine and used for calculating thespecific energy content in fuel. Density is also an important factor when it comes to the process ofseparating water or solids from bunker fuel.For the typical ship operator the primary importance of density comes from the fact that bunker fuel isdelivered by volume but paid per ton. The conversion is done by means of the fuel density reported bythe supplier. A small density difference between stated and actual fuel density can quickly lead tolarge financial losses for the ship operator. For instance, if a density of 977 kg/m3 is stated when theactual value happens to be 960 kg/m3, this will give rise to a difference of nearly 35 ton when p. 1
    • bunkering 2000m3, the value of which, in the current market, is close to US$ 20,000 – just for a singlebunkering.Although this example belongs in the high end of the spectrum, it is not at all hard to find even moreextreme examples in real life. And such a way of making a quick buck is exploited by many fuelsuppliers as their stated density is usually used to calculate the quantity of the delivered fuel. Over-reporting of density, i.e. claiming that the fuel density is higher than what is actually the case, is calledshort-lifting, while the opposite could be termed long-lifting. Short-lifting implies that the shipoperator loses money, since he pays for more fuel than he receives. Long-lifting implies that the fuelsupplier loses money, and that the ship operator gets more than what he pays for.The global market for marine bunker fuel is more than 300 million tons annually (IEA 2010, p. 618;Eyring et al 2010; IMO 2009; EPA 2008). We estimate that more than 300,000 tons of bunker fuel, i.e.about 1‰ of the global consumption, is short-lifted every year. We further estimate that the amount oflong-lifting exceeds 150,000 tons. That is, on the order of half a million tons are long- or short-liftedannually. Thus, bunker fuel worth more than US$200 million appears not to be properly accounted forevery year.Both short- and long-lifting may be indications of fraudulent behavior of individual employees withinthe ship operator’s or bunker fuel supplier’s organization. Such behavior is however sufficientlywidespread that a systematic and commonly accepted short-lifting praxis in parts of the bunker fueltrade may be suspected. Some fuel suppliers use this tactic to consistently over-state the deliveredamount to improve the company’s profit margin. Many ship operators and suppliers would welcome abenchmarking of suppliers, ports, or geo-regions against some best practice.The rest of the paper is organized as follows: In Section 2 we take a closer look at concrete examplesof different density reporting strategies and discuss the difficulties associated with single numbercharacteristics. In Section 3 we use this to characterize good suppliers and derive criteria for defining abest practice. In Section 4, a Best Practice Classifier is constructed that will assign a Best PracticeScore to an individual bunkering or a supplier. We also present a series of benchmarking comparisonsbetween regions together with an overview of how they developed over a 10 year period. This paperends with a discussion and some promising leads for further work. 2. Investigating density reporting behaviorTable 1 gives some statistics for density deviations on a global and local basis (e.g. Canada and the USWest coast, South Asia, Middle East, and South America West) and for 4 selected suppliers (S1, S2 , S3,S4) in 4 different bunker ports. The density difference, dd, is the difference between the densityclaimed by the supplier and the actual density measured by a fuel testing agency (e.g. DNVPS). Theaverage density difference, dd , could in principle be used to characterize the behavior of a fuelsupplier (a port or a region) as good, medium or bad.Unfortunately, most of such single number quality measures have some sort of shortcoming as theycompress a wealth of information into a single number. They often wipe out (quite effectively) muchof the information about the interesting behavior of a supplier. In addition, the arithmetic mean ormedian may be less suited for distributions that are non-normal, skewed or showing heavy tails. Also,the mean and standard deviation is very sensitive to outliers (a few unusually large or smallobservations) (Bhattacharyya & Johnson 1977). As an example, the mean value of ten bad bunkeringscould easily be balanced by one exceptionally good one (or a typing error), while the median is lesssensitive to outliers. Another problem with the mean and median is that they reveal nothing about theshape of the underlying distribution. For instance, if we only look at the mean, the geo-region SouthAmerica West seems to be better than e.g. Canada & US West Coast from a short-lifting perspective,see Table 1. If we take the standard deviation into account it is obvious that there is a higher risk ofbeing short-lifted in South America West than in the other geo-regions, simply because thedistribution is wider. The standard deviation only refers to the width of the underlying distribution butnot to the actual shape. As can be seen in Figure 2 the distributions are non-normal, i.e. a highlyskewed middle spike combined with a very long one-sided tail. p. 2
    • Table 1: Standard descriptive measures of density differences for some selected geo-regions and suppliers(n = number of samples, dd = mean density difference, σdd = standard deviation of dd). Histograms for thegeo-regions and suppliers are shown in Figures 1 and 2 respectively, whereas their scatter plots are shownin Figures 3 and 4. Data in this table and in the following examples is, unless otherwise stated, based onDNVPS bunkering samples of RMG380 fuel collected in 2008 (confer DNV 2010). dd median(dd) n in kg/m3 σdd in kg/m3 Global 43343 0.39 0.10 3.92 Canada & US West Coast 1919 0.03 -0.10 2.43 South Asia 6806 1.22 0.90 3.35 Middle east 2990 1.83 0.70 4.76 South America West 565 -0.48 -0.90 6.00 Supplier 1 (S1) 129 -0.12 -0.10 0.95 Supplier 2 (S2) 239 2.31 0.90 4.84 Supplier 3 (S3) 71 2.40 2.60 1.83 Supplier 4 (S4) 145 2.07 1.50 2.81HistogramsFor a more detailed understanding of the properties of the data in Table 1 please refer to the densitydifference histograms of Figures 1 and 2. For comparison we have plotted a smoothed version of theglobal histogram (dashed line) and a smoothed version of the actual histogram (solid line). Thesehistograms represent estimates for the underlying probability density distribution and can thus tell ussomething about the risk and possible amount of the short-lifting. A comparison with a referencehistogram, like the global histogram, would provide the desired benchmark.From Figure 1 it can be seen that none of the histograms seem to come from a normal distribution (theimplications of this observation will not be further discussed in this paper). This can be confirmed bymeans of a probability plot. The different geo-regions also show significant differences in their densityreporting practice. Canada & US West Coast appears better than the global average, the peak of thehistogram is centered at 0 and has shorter tails. For South Asia, the width of the histogram is similar tothe global one, but its center is shifted towards short-lifting, whereas the Middle East shows a fairlyheavy short-lifting tail. The histogram for South America West is especially remarkable as the chanceof actually getting the fuel density stated by the supplier appears to be slim. The rule is rather that thebuyer is either short- or long-lifted, something which could not be deduced from the standarddescriptive statistics.Figure 1: Probability distribution of density reporting deviations (i.e. the difference between claimed andmeasured density) for 4 selected geo-regions. The histograms are (clockwise from top left): Canada & USWest Coast, South Asia, Middle East, and South America West Coast. The solid lines represent thesmoothed histogram while the dashed lines are the smoothed global histogram. The underlying number ofsamples, averages, medians, and standard deviations are given in Table 1. The histograms revealconsiderable variation in density reporting.Histograms for individual suppliers listed in Table 1 are shown in Figure 2 below. A visualcomparison indicates that Supplier 1 is much better than the global average with a narrow symmetricdistribution centered at 0. The three other suppliers are all heavily short-lifting with varying degrees ofright-shifted and/or right-heavy distributions. Based on these histograms the suppliers might becharacterized as rather bad, but any fine grained information about their underlying reporting strategyis removed by the histogram. A main disadvantage of using histograms for characterizing suppliers isthat they require a considerable amount of data which could be a challenge when considering shorttime periods or suppliers with few data samples. p. 3
    • Figure 2: Probability distribution of density reporting deviations (i.e. the difference between claimed andmeasured density) for 4 selected suppliers in 4 different bunker ports (for more details se Table 1). Thehistograms reveal different reporting behavior, but histograms become noisy when the number of samplesbecomes too low.Scatter plotsScatter plots of measured vs. claimed density allows a much more fine grained view on the underlyingdata. These plots may be used to unravel the various reporting strategies of the suppliers, see Figure 3and Figure 4. Scatter plots quite effectively visualize the density reporting behavior of suppliers orgroups of suppliers. Note that each dot in a scatter-plot represents at least one bunkering sample. Thediagonal solid line represents correct density reporting (i.e. stated = measured, in the following calledno-cheat line). The horizontal and vertical dashed lines specify the upper density limit given by theISO8217 standard.These scatter plots exhibit some interesting observations. Note that the range of densities of theavailable fuel varies between geo-regions; e.g. the fuel density range is much wider in the Middle Eastthan in North America or South Asia. This phenomenon may be traced back to the proximity to crudeoil production in the regions.Observe also that in many bunkerings the fuel density was above the limit (dots to the right of verticaldashed line) but almost none of them were reported to lie above the limit (above horizontal dashedline). This is true for all suppliers.From Figure 4 we may deduce that Supplier 1 could be considered as rather good, since most of hissamples are on or close to the no-cheat line. This behavior seems to be dominant for most of thesuppliers in the Canada & US West geo-region (note: good suppliers are found in all geo-regions). Incontrast, Supplier 2 may be regarded as bad, since his stated densities cover the whole range from theno-cheat line and all the way up to maximum-cheating, i.e. the upper density limit given by thestandard. This type of behavior is also visible both in the South Asia and the Middle East scatter plots.It seems that Supplier 3 has a strategy of simply adding an offset to the real density, which is reflectedin the mean density different from zero and a relative low standard deviation. A fourth reportingscheme appears in Supplier 4 who has a tendency of always stating a density near the limit –independently of the actual density. This could be termed as the worst behavior since they short-lift asmuch as possible. This behavior is not uncommon in South Asia and the Middle East. Variations tothis scheme, i.e. stating a fixed fuel density but lower than the limit, are seen in Asia, Middle East andSouth America West. They appear as horizontal lines in the scatter plot.Figure 3: Scatter plot of measured vs. claimed density for the same geo-regions as in Table 1 and Figure 1.Each black dot represents (at least) one bunkering. The solid line represents the no-cheat line, i.e.bunkerings where the supplier states the density correctly (claimed = measured), whereas the dashed linesindicate the upper density limit in the ISO standard for bunker fuel (ISO8217), viz. 991 kg/m3, implicitlygiving the maximum possible amount of cheating. Many dots along the upper dashed line indicate a highdegree of cheating in many bunkerings. Note that in many bunkerings the fuel density was above the limit(dots to the right of vertical dashed line) but almost none of them were reported to lie above the limit(above horizontal dashed line).Figure 4: Scatter plot of measured versus claimed density for the same suppliers as in Table 1 and Figure2. Supplier 1 reports quite honestly as his dots are scattered close along the no-cheat line. In contrast,Supplier 2 and 3 have many reportings away from this no-cheat line but they are not as dishonest asSupplier 4, who basically reports only one density close to 991 irrespective of the actual fuel density. p. 4
    • 3. The Good: Best practice benchmarkThe above discussion has emphasized the need for a good benchmark for measuring the goodness indensity reporting, and for distinguishing between various short-lifting and long-lifting strategies.The scatter plots of Canada & US West Coast and Supplier 1 are examples of good density reportingbehaviors that could be used as best practice references. Our interpretation of good or best practice isindicated by the grey diagonal area around the no-cheat line in Figure 5. Fair reporting and goodcontrol of the delivered density should result in a small symmetric scatter around the no-cheat line,and thus a narrow density difference (dd) histogram centered at dd = 0 (like the one for Supplier 1 inFigure 2).The goal is to establish a best practice, and then use it as a predefined reference to which bunkeringsmay be compared. This best practice benchmark is given by the dd-histogram for a group of selectedgood suppliers.Figure 5: Scatter plot of bunkering data from South Asia. Data points around the diagonal line (no-cheatline) indicates good or best practice behavior, i.e. fair reporting, with little or no cheating. In the areaabove the no-cheat line, customers get short-lifted (pay too much) whereas below the line the supplier losesmoney. The more dots there are above the fair line, and the further away from it they are, the lessaccurate the density reporting. Bunkerings far below the fair area should be considered suspicious andmay indicate a bribing situation. Reportings in the grey horizontal area (reporting densities close to theupper density limit) indicate that some suppliers consciously choose a strategy of maximum densitycheating. A close up of the scatter plot near the density limit = 991 kg/m3 reveals that hardly any suppliersare willing to state that their fuel exceeds the limit even when this is clearly the case.This best practice histogram shall represent good suppliers and should be based on many data points.Any outliers, intentional cheating, or other indications of dishonesty should be eliminated to obtain anunbiased and fair benchmark. The following criteria for deriving the best practice benchmark shouldtherefore be chosen (there will always be a certain element of subjective judgment in this process, butthe method for deriving the benchmark should as far as possible be transparent, sound, and unbiased): 1) Select some geo-regions where the scatter plots show that data are predominantly found along the no cheat line. 2) For each selected dataset we: a. Eliminate extreme outliers, max cheating and near limit lying; only data inside a predefined area around the no-cheat line is selected (see Figure 6 for details). b. Eliminate any bias by centering the dd data around dd = 0. 3) The adjusted and selected dd data for all the selected sets are then merged into one large dataset. 4) Calculate the dd histogram for the dataset.Figure 7 shows the best practice reference histogram derived from the geo-regions Biscay, Canada &US East Coast, Canada & US West Coast, US Gulf Coast, and Oceania.Figure 6: Only bunkering samples between the 2 blue solid lines will be used as basis for deriving the bestpractice benchmark histogram. This effectively eliminates max cheating, outliers, and ‘near limit effects’,i.e. less than complete honesty when selling too heavy fuel. The upper solid line divides the angle betweenno-cheat and max-cheat lines. The lower solid line is simply mirrored around the no-cheat line such thatthe density deviations are the same above and below, i.e. |+ | = |- |. p. 5
    • Figure 7: Best practice dd histogram based on samples from selected geo-regions (Biscay, Canada& US East and West Coast, US Gulf Coast and Oceania) where max cheating, outliers and nearlimit dishonesty have been eliminated. The dashed line is the histogram function H, i.e. asmoothed version of the histogram indicating the global best practice.Classification by membership functionOnce the best practice histogram is generated, the challenge is to benchmark a supplier, a port, or aregion against it. In principle, this histogram must be compared with the dd histograms for thesuppliers in question and the degree of conformance would then give the desired benchmark.Unfortunately this is a non-trivial task and for many of the suppliers only relatively few samples areavailable, resulting in bad histograms. We therefore propose a more elegant approach that isinsensitive to the number of data points and outliers, and that can even be used for a single bunkering.The concept of a membership function (Turksen 1991; Terano et al 1987, p. 21), which is widelyapplied in Fuzzy set theory (Lowen 1996, Self 1990), is used to achieve this benchmarking. A singlenumber (score) is computed denoting the goodness of a specific bunkering or supplier.An example will hopefully make this clear. Consider the task of benchmarking people into fast andslow runners, respectively. One way to do this is to set a threshold T on how fast a person should beable to run 100 m, and then categorize the people who run slower than the threshold as slow (=0) andthose who run faster than the threshold as fast (=1). This sorting is achieved by a Boolean membershipfunction B with threshold T for the measured time t on 100 m, i.e. B(T,t). However, it is quite obviousthat this benchmarking will result in a crude oversimplification as there is a continuous transition fromextremely fast runners to the really slow ones, and a small change in the chosen threshold couldseriously alter the number of members in each category. A better approach would be to replace theBoolean function with a continuous function, assigning a continuous membership value between 0 and1 depending on how fast they run. This is an example of a so-called membership function, and will inthe following simply be denoted m.The situation is analogous to our best practice density benchmark where suppliers (or bunkerings) arenot grouped into crisp sets of good and bad but rather get a score indicating how close to or far awayfrom the best practice they are. This, by the way, is also the reason why e.g. discriminant analysis(Hastie et al 2009) is unsuitable for the task at hand.The challenge is to find a membership function for the good group, faithfully reflecting what weconsider to be good. Fuzzy set theory does not provide help in determining the membership function,as all kinds of functions are used, e.g. triangular, trapezoid, Gaussian, etc. The discussion of goodbehavior above gives us some hints about the properties of the desired membership function. It shouldnot be too wide, as a bad bunkering could then be regarded as good. Likewise, if it is too narrow then agood bunkering would get a too low goodness score. It is important that the membership functionrepresents the best practice set as well as possible. The obvious choice is to derive the membershipfunction directly from the dd histogram itself.The membership function for good bunkerings, mG, must have a maximum value of 1 at dd = 0, i.e.mG(dd=0) = 1, and is continuously decreasing in both directions, i.e. a rescaling and shift of the Hhistogram has to be done. We therefore propose the following definition of the membership function: H (dd ) H (dd ) m G (dd ) = = max(H ) H (0)where the subscript G indicates that this gives a goodness scoring, and H is the smoothed (andadjusted) best practice histogram (i.e. H is the histogram function). Note that mG is a function of thedistance of dd to 0, as well as the frequency of dd in the best practice. This membership function cannow be applied e.g. to all n supplier samples to obtain the overall goodness benchmark, p. 6
    • 1 n bG = ⋅ ∑ mG (dd i ) n i =1where the summation is done over all n bunkerings for a specific supplier, port, or geo-region.An interesting observation is that the scoring from the membership function mG(dd) is not (a priori) aprobabilistic measure, it is a measure (0→1) based on how far away a variable is from some value, i.e.dd=0; see Figure 8. However, this rescaling does preserve an interesting probabilistic feature, viz. thefollowing: the probability of finding a value x in a small interval around dd, relative to that of finding avalue y in an equally sized interval close to 0, given that the samples are drawn from the best practicegroup.Figure 8: The solid line gives the goodness membership function, mG, which is a scaling of the best practicehistogram. mB = 1-mG gives the membership function for the opposite (dashed line), i.e. bad which in turncould be divided into a long- and short-lifting part, mLL and mSL respectively (corresponding to negativeand positive dd values). E.g. a bunkering with dd=2.3 would get a good score of mG=0.23 and a bad score ofmB=0.77 (with mLL=0 and mSL=0.77).The BadNote that mG(dd) was derived based on what was chosen to be the best practice. It therefore gives ameasure/score for how good a bunkering or supplier is with respect to this best practice. Thecomplementary, mB(dd) = 1 - mG(dd),give a badness scoring but it will not tell weather the bad scoring comes from short- or long-lifting.Fortunately, mB can, depending on whether a sample falls into the short- or long-lifting domain, befurther divided into mSL and mLL. That is, if the dd value of a sample is positive, its mSL will be greaterthan zero; if the dd value of a sample is negative, its mLL will be greater than zero.This enables us to calculate short- and long-lifting scores similar to the goodness score: 1 n b xL = ⋅ ∑ mxL (dd i ) , n i =1where the subscript xL should be SL or LL, which stands for short- or long-lifting, respectively. Thesescores indicate the behavior of a supplier and give the risk of being short- or long-lifted. Note, bydefinition: bG + bSL + bLL = 1Remember that the scores correspond to the degree of membership, i.e. how close a bunkering is to thegood or bad benchmark, they can therefore be understood as weights corresponding to the proportionof good or bad.The UglyAs pointed out above, profit maximization by reporting densities at or close to the upper limit may beconsidered as fairly ugly behavior. The same methodology can be applied to obtain a near limit scorefor this behavior by constructing a membership function mNC(claimed density) = mG(claimed density - 991)where the subscript NC denotes Near Ceiling.This membership function assigns a scoring to a bunkering corresponding to the distance from thedensity limit and frequency of occurrence in the benchmark. To avoid categorizing a bunkering asugly when the measured density is actually near the limit, we employ a convolution of mNC and mSL. In p. 7
    • so doing we exclude all reportings that are near the limit but that are actually honest. We propose thefollowing ugly or near limit benchmark 1 n b NC = ⋅ ∑ mSL (dd i ) ⋅ mNC (claimed densityi ) n i =1giving the fraction of short-lifting that could be considered as near limit reporting.Further characterization of Good and BadIn order to further characterize bunkering samples within the good-, short-, or long-lifting region in thescatter plot, the average density deviations in each region could be computed by weighting eachbunkering sample with the corresponding score from the membership function. For instance, the meandensity difference ( dd SL ) in the short-lifting area is: ∑ (dd ) ⋅ m (dd ) i i SL i dd SL = ∑ m (dd ) i SL iin kg/m3, where the index i runs over all samples n.This means, for a given supplier we can provide information about the risk of being short-lifted, bSL,and about the expected average amount in density difference, dd SL . The method is easily extended tothe other identified behaviors. 4. Application of the benchmarksAs discussed above the power of the scatter plot lies in the visualization of the different densityreporting schemes. Several patterns, like fixed value density reporting, systematic density reportingdeviations, etc., are easily spotted. The benchmarks developed above are constructed to discriminatebetween some of these different reporting schemes, and to quantify the risk of being short-lifted aswell as the amount of short-lifting that should be expected. The benchmarks for our examples fromTable 1 are given in Table 2 below.Table 2: Standard descriptive measures together with our benchmark(s) for the geo-regions and suppliersfrom Table 1. The benchmarks for the data that were used to generate the best practice histogram are alsoincluded for comparison. A row, e.g. Global, is read as follows: average density difference is 0.39, std=3.92.Benchmarking against the best practice gives the following results: 43% of the samples can be regarded asgood (bG), 31% qualify as short-lifting (bSL), and 26% as long-lifting (bLL). For the short-lifting samples theaverage density difference is 3.31, but only 7% of them were near the ceiling. dd σdd bG bSL bLL bNC dd SL (kg/m3) (kg/m3) Best Practice 0.05 1.16 0.62 0.19 0.19 0.01 1.50 Global 0.39 3.92 0.43 0.31 0.26 0.07 3.31 Canada & US West Coast 0.03 2.43 0.55 0.22 0.24 0.02 2.09 South Asia 1.22 3.35 0.41 0.52 0.07 0.26 2.44 Middle east 1.83 4.76 0.32 0.49 0.19 0.02 4.61 South America West 0.48 6.00 0.08 0.42 0.50 0.00 3.73 Supplier 1 0.12 0.95 0.71 0.09 0.20 0.02 1.70 Supplier 2 2.31 4.84 0.36 0.53 0.11 0.13 4.65 Supplier 3 2.40 1.83 0.09 0.87 0.03 0.00 2.81 Supplier 4 2.07 2.81 0.27 0.72 0.01 0.46 2.64 p. 8
    • The samples used to generate the best practice histogram were included in the table for easycomparison. Note that the only way the good score can be 1 is when all samples are at dd=0, thisexplains why even the good score of the best practice is ‘only’ 0.62. The table shows that for theselected geo-regions the highest risk of being short-lifted is found in South Asia. The near-limitbenchmark, bNC, confirms what is apparent from the scatter-plot (Figure 3), that for many suppliers itis a common practice to maximize their profit by just reporting a fuel density at or near the limit.South America West nicely illustrates the strong ability of the benchmark to identify the underlyingbehavior. Recall that for this area the mean was near zero, but the high standard deviation suggestedlarge fluctuations in their reporting. Even so, no indications about the underlying reporting schemes,or the risk of being short- or long-lifted, can be deduced. In contrast, our benchmark reveals that thelikelihood of actually getting what you paid for is rather slim, viz. around 8%. In the vast majority ofthe cases either short- or long-lifting takes place.Observe also that Supplier 1 can indeed be regarded as honest with a good score higher than bestpractice. Supplier 2 and 3 have comparable average density differences but their good and near limitbenchmarks clearly separates them. A comparison of the benchmarks with the corresponding scatterplots will confirm that the benchmarks do indeed give a more accurate description of the honesty ofsuppliers than standard descriptive statistics.Figure 9: Comparison of different benchmarking methods: suppliers ranked based on their mean densitydifference, dd , (top), and their corresponding good score, bG (bottom). Observe that ranking with respectto the mean would result in about 1057 good suppliers (| dd | ≤ 0.7). Our scoring with respect to bestpractice, (0.62), reveals however that about 150 are definitively bad (left-hatched area), even below globalaverage (0.43). 539 are rally good (equal to or better than best practice, right-hatched area) whereas therest are located between global average and best practice. Observe also that simply relying on the mean tocharacterize suppliers would label several of them as bad even though their good score is above global bestpractice.Supplier rankingIn Figure 9 (top) all suppliers of RMG380 fuel worldwide are ranked with respect to their meandensity difference, dd . When using | dd | ≤ 0.7 as a criterion for goodness then the mean would implythere are about 1057 good suppliers. Applying this mean dd to our benchmarking method results inthe continuous bell-shaped curve (blue). If dd is indeed an unbiased measure for the goodness ofsuppliers, then their scorings should be closely scattered around this curve – this is, however, not at allthe case. This discrepancy stems from the unreliability of the mean (or standard deviation) as atrustworthy measure whenever the underlying distributions are non-normal or outliers have a largeeffect. The figure visualizes clearly that 150 of the apparently good suppliers are actually quite bad, i.e.even below global average (left hatched area), whereas just about the half (539) can be consideredequal to or better than best practice (right hatched area). Observe also that many of the apparently badsuppliers (those with | dd | > 0.7) are actually better then their reputation as most of them are above thebell shaped curve, some are even above best practice – further emphasizing the need for an unbiasedscore like bG.Development over timeFollowing the development of the score of a supplier, port, or region over time may give valuableindications about what may be expected in the near future. For instance, Figure 10 shows thedevelopment of the bG score for two major ports, Singapore and Rotterdam, over the past 25 years. p. 9
    • Figure 10: Time series of goodness scores bG for two large ports in different geo-regions. Data from allavailable suppliers are included. Dots are quarterly time intervals while the stippled lines are yearaverages. Each dot is based on a varying number of ‘raw data points’, i.e. the number of bunkeringsduring the corresponding time interval.Observe that from the beginning of the 1980s and up to the mid 1990s the quality of the densityreporting was increasing. It then leveled off until 2008, when a change in behavior occurred – perhapstriggered by the onset of the global recession? 5. Discussion and concluding remarksThis paper has two main focus areas: the construction of a realistic benchmark and the development ofa methodology that allows comparing one or more samples with the benchmark.The examples given above demonstrate the capabilities of our approach. It is more powerful thanstandard descriptive statistics (e.g. dd and σdd), as it is less sensitive to outliers and is well suited forsmall datasets and even single numbers. Recall that our benchmarks give better quantifications thanthe dd and σdd together. Further, it makes no assumptions about the data distributions. There areactually no restrictions to the probability distribution of the underlying data – any distribution isallowed. Only some weak requirements apply to the membership function (e.g. increasing/decreasing).The methodology is quite generic and could in principle be applied to any kind of comparison task, i.e.benchmarking.The fact that the benchmark is based on a probability density function, and that a probabilisticinterpretation of the scoring is possible, is an aid to the user’s intuition, making it easier to understandand interpret the results.Once a best practice histogram has been generated, a membership function can be derived, after whichbenchmarking is easily done. Subjectivity is only involved in the definition of what can be regarded asbest practice, as there is no a priori correct answer to this problem. Our approach has been to ask:what should be expected of a good supplier? And by answering this question we have picked suppliersthat best match our expectations. Outliers and incorrect claims near the density limit are of course notwanted from a good supplier, hence their removal from the best practice data set.From a user perspective the main strengths of the presented benchmark are: • Institutive and easy to understand. • Applicable for few or even singleton samples. • Able to pinpoint different density reporting schemes.In closing let us return to the extent and amount of global short-lifting which is estimated to be around1.7 ton per bunkering on average. Thanks to our benchmarking methodology we can now provide amore detailed picture of the situation. First, 43% of the bunkerings could be considered to be lossneutral (bG=0.43), since they are within best practice. Second, 26% are instances of long-lifting(bLL=0.26), where the buyer gains on average 1.8 ton. Third, 31% could be regarded as short-lifting(bSL=0.31), with an average buyer loss of 2.5 ton per bunkering. This highlights the importance ofchoosing the right supplier.The presented benchmark methodology is easily extendable to other (quality and economical)bunkering parameters like viscosity, sulfur or water content, as well as a series of physical andchemical properties. The methodology will be the basis for a benchmarking web tool, scheduled forrelease by DNVPS later this year.Figure 11: Bunker surveyor on board a ship. Photo by DNV Petroleum Services (used withpermission). p. 10
    • ReferencesBhattacharyya, G., Johnson, R. (1977), Statistical Concepts and Methods, Wiley, New York.DNV (2010). Total fuel management,http://www.dnv.com/industry/maritime/servicessolutions/fueltesting (accessed 13. Oct. 2010).EPA (2008), Global Trade and Fuels Assessment -Future Trends and Effects of Requiring Clean Fuelsin the Marine Sector. Assessment and Standards Division Office of Transportation and Air Quality,U.S. Environmental Protection Agency. EPA420-R-08-021, November 2008.Eyring, V., Isaksen, I.S.A., Berntsen, T., Collins, W.J., Corbett, J.J., Endresen, O., Grainger, R.G.,Moldanova, J., Schlager, H., Stevenson, D.S. (2010), “Transport impacts on atmosphere and climate:Shipping”, Atmospheric Environment, Volume 44, Issue 37, December 2010, pp. 4735-4771.Hastie, T., Tibshirani, R., Friedman, J. (2009), The Elements of Statistical Learning: Data Mining,Inference, and Prediction (second edition). Springer, New York.IEA (2010). World Energy Outlook 2010. International Energy Agency, OECD Publishing, Paris.IMO (2009). Prevention of Air Pollution from Ships. International Maritime Organization, MarineEnvironment Protection Committee. MEPC 59/INF.10, 9 April 2009.Lowen, R. (1996), Fuzzy Set Theory, Kluwer Academic Publishers, Dordrecht.Self, K. (1990), “Designing with fuzzy logic”, IEEE Spectrum, Vol 27, No 11, November 1990, pp.42-44, p. 105.Terano, T., Asai, K., Sugeno, M. (1987), Fuzzy Systems Theory and its Applications. Academic Press,San Diego.Turksen, I.B. (1991), “Measurement of membership functions and their acquisition”, Fuzzy Sets andSystems, Vol. 40, pp. 5-38. p. 11
    • Figure 1:
    • Figure 2:
    • Figure 3:
    • Figure 4:Figure 10:
    • Figure 11:
    • Figure 5 Limit max. cheat area 991 981 Bad Claimed density 971 Good Suspicious Limit 961 961 971 981 991 Measured density
    • Figure 6 Limit = max. cheat line = + - e t lin ea o ch n Limit
    • Figure 7 Probability Density deviations
    • Figure 8 Long-lifting 1 Short-lifting mG mB=1-mG Bad: mB =1-0.23 = 0,77 Good: mG = 0.23 0 dd = 2.3 density difference
    • Figure 9 10 5 Ca. 1057 suppliers 0.7 total number 0 of suppliers - 0.7 0 500 1000 1500 2000 2500 claimed – measured density -5 1 539 Some “bad suppliers” are actually very good ! 0,75 Best practice score 0,5 Global average score Good score Some “bad suppliers” are actually slightly better ! 0,25 Many “good suppliers” are actually quite bad ! 150 0 0 500 1000 1500 2000 2500