SlideShare a Scribd company logo
1 of 52
1
Readings Week 10
Hô Chi Minh (1880-1969)
Declaration of Independence of the Democratic Republic of
Vietnam, September 2, 1945
“All men are created equal. They are endowed by their Creator
with certain inalienable rights,
among them are Life, Liberty, and the pursuit of Happiness."
This immortal statement was made in the Declaration of
Independence of the United States of
America in 1776. In a broader sense, this means: All the peoples
on the earth are equal from
birth, all the peoples have a right to live, to be happy and free.
The Declaration of the French Revolution made in 1791 on the
Rights of Man and the Citizen
also states: “All men are born free and with equal rights, and
must always remain free and have
equal rights.”
Those are undeniable truths.
Nevertheless, for more than eighty years, the French
imperialists, abusing the standard of
Liberty, Equality, and Fraternity, have violated our Fatherland
and oppressed our fellow-citizens.
They have acted contrary to the ideals of humanity and justice.
In the field of politics, they have deprived our people of every
democratic liberty.
They have enforced inhuman laws; they have set up three
distinct political regimes in the North,
the Center and the South of Vietnam in order to wreck our
national unity and prevent our people
from being united.
They have built more prisons than schools. They have
mercilessly slain our patriots; they have
drowned our uprisings in rivers of blood.
They have fettered public opinion; they have practiced
obscurantism against our people.
To weaken our race they have forced us to use opium and
alcohol.
In the field of economics, they have fleeced us to the backbone,
impoverished our people, and
devastated our land.
They have robbed us of our rice fields, our mines, our forests,
and our raw materials. They have
monopolized the issuing of bank-notes and the export trade.
They have invented numerous unjustifiable taxes and reduced
our people, especially our
peasantry, to a state of extreme poverty.
2
They have hampered the prospering of our national bourgeoisie;
they have mercilessly exploited
our workers.
In the autumn of 1940, when the Japanese Fascists violated
Indochina’s territory to establish new
bases in their fight against the Allies, the French imperialists
went down on their bended knees
and handed over our country to them.
Thus, from that date, our people were subjected to the double
yoke of the French and the
Japanese. Their sufferings and miseries increased. The result
was that from the end of last year to
the beginning of this year, from Quang Tri province to the
North of Vietnam, more than two
million of our fellow-citizens died from starvation. On March 9,
the French troops were
disarmed by the Japanese. The French colonialists either fled or
surrendered showing that not
only were they incapable of “protecting” us, but that, in the
span of five years, they had twice
sold our country to the Japanese.
On several occasions before March 9, the Vietminh League
urged the French to ally themselves
with it against the Japanese. Instead of agreeing to this
proposal, the French colonialists so
intensified their terrorist activities against the Vietminh
members that before fleeing they
massacred a great number of our political prisoners detained at
Yen Bay and Caobang.
Notwithstanding all this, our fellow-citizens have always
manifested toward the French a tolerant
and humane attitude. Even after the Japanese putsch of March
1945, the Vietminh League helped
many Frenchmen to cross the frontier, rescued some of them
from Japanese jails, and protected
French lives and property.
From the autumn of 1940, our country had in fact ceased to be a
French colony and had become
a Japanese possession.
After the Japanese had surrendered to the Allies, our whole
people rose to regain our national
sovereignty and to found the Democratic Republic of Vietnam.
The truth is that we have wrested our independence from the
Japanese and not from the French.
The French have fled, the Japanese have capitulated, Emperor
Bao Dai has abdicated. Our
people have broken the chains which for nearly a century have
fettered them and have won
independence for the Fatherland. Our people at the same time
have overthrown the monarchic
regime that has reigned supreme for dozens of centuries. In its
place has been established the
present Democratic Republic.
For these reasons, we, members of the Provisional Government,
representing the whole
Vietnamese people, declare that from now on we break off all
relations of a colonial character
with France; we repeal all the international obligation that
France has so far subscribed to on
behalf of Vietnam and we abolish all the special rights the
French have unlawfully acquired in
our Fatherland.
3
The whole Vietnamese people, animated by a common purpose,
are determined to fight to the
bitter end against any attempt by the French colonialists to
reconquer their country.
We are convinced that the Allied nations which at Tehran and
San Francisco have acknowledged
the principles of self-determination and equality of nations, will
not refuse to acknowledge the
independence of Vietnam.
A people who have courageously opposed French domination
for more than eight years, a people
who have fought side by side with the Allies against the
Fascists during these last years, such a
people must be free and independent.
For these reasons, we, members of the Provisional Government
of the Democratic Republic of
Vietnam, solemnly declare to the world that Vietnam has the
right to be a free and independent
country—and in fact is so already. The entire Vietnamese
people are determined to mobilize all
their physical and mental strength, to sacrifice their lives and
property in order to safeguard their
independence and liberty.
George Marshall (1880-1959)
Speech at Harvard, 5 June 1947
I'm profoundly grateful and touched by the great distinction and
honor and great compliment
accorded me by the authorities of Harvard this morning. I'm
overwhelmed, as a matter of fact,
and I'm rather fearful of my inability to maintain such a high
rating as you've been generous
enough to accord to me. In these historic and lovely
surroundings, this perfect day, and this very
wonderful assembly, it is a tremendously impressive thing to an
individual in my position. But to
speak more seriously, I need not tell you, gentlemen, that the
world situation is very serious. That
must be apparent to all intelligent people. I think one difficulty
is that the problem is one of such
enormous complexity that the very mass of facts presented to
the public by press and radio make
it exceedingly difficult for the man in the street to reach a clear
appraisement of the situation.
Furthermore, the people of this country are distant from the
troubled areas of the earth and it is
hard for them to comprehend the plight and consequent
reactions of the long-suffering peoples,
and the effect of those reactions on their governments in
connection with our efforts to promote
peace in the world.
In considering the requirements for the rehabilitation of Europe,
the physical loss of life, the
visible destruction of cities, factories, mines and railroads was
correctly estimated but it has
become obvious during recent months that this visible
destruction was probably less serious than
the dislocation of the entire fabric of European economy. For
the past 10 years conditions have
been highly abnormal. The feverish preparation for war and the
more feverish maintenance of
the war effort engulfed all aspects of national economies.
Machinery has fallen into disrepair or
is entirely obsolete. Under the arbitrary and destructive Nazi
rule, virtually every possible
enterprise was geared into the German war machine. Long-
standing commercial ties, private
4
institutions, banks, insurance companies, and shipping
companies disappeared, through loss of
capital, absorption through nationalization, or by simple
destruction. In many countries,
confidence in the local currency has been severely shaken. The
breakdown of the business
structure of Europe during the war was complete. Recovery has
been seriously retarded by the
fact that two years after the close of hostilities a peace
settlement with Germany and Austria has
not been agreed upon. But even given a more prompt solution of
these difficult problems the
rehabilitation of the economic structure of Europe quite
evidently will require a much longer
time and greater effort than had been foreseen.
There is a phase of this matter which is both interesting and
serious. The farmer has always
produced the foodstuffs to exchange with the city dweller for
the other necessities of life. This
division of labor is the basis of modern civilization. At the
present time it is threatened with
breakdown. The town and city industries are not producing
adequate goods to exchange with the
food producing farmer. Raw materials and fuel are in short
supply. Machinery is lacking or worn
out. The farmer or the peasant cannot find the goods for sale
which he desires to purchase. So the
sale of his farm produce for money which he cannot use seems
to him an unprofitable
transaction. He, therefore, has withdrawn many fields from crop
cultivation and is using them for
grazing. He feeds more grain to stock and finds for himself and
his family an ample supply of
food, however short he may be on clothing and the other
ordinary gadgets of civilization.
Meanwhile people in the cities are short of food and fuel. So the
governments are forced to use
their foreign money and credits to procure these necessities
abroad. This process exhausts funds
which are urgently needed for reconstruction. Thus a very
serious situation is rapidly developing
which bodes no good for the world. The modern system of the
division of labor upon which the
exchange of products is based is in danger of breaking down.
The truth of the matter is that Europe's requirements for the
next three or four years of foreign
food and other essential products - principally from America -
are so much greater than her
present ability to pay that she must have substantial additional
help or face economic, social, and
political deterioration of a very grave character.
The remedy lies in breaking the vicious circle and restoring the
confidence of the European
people in the economic future of their own countries and of
Europe as a whole. The
manufacturer and the farmer throughout wide areas must be able
and willing to exchange their
products for currencies the continuing value of which is not
open to question.
Aside from the demoralizing effect on the world at large and the
possibilities of disturbances
arising as a result of the desperation of the people concerned,
the consequences to the economy
of the United States should be apparent to all. It is logical that
the United States should do
whatever it is able to do to assist in the return of normal
economic health in the world, without
which there can be no political stability and no assured peace.
Our policy is directed not against
any country or doctrine but against hunger, poverty, desperation
and chaos. Its purpose should be
the revival of a working economy in the world so as to permit
the emergence of political and
social conditions in which free institutions can exist. Such
assistance, I am convinced, must not
be on a piecemeal basis as various crises develop. Any
assistance that this Government may
5
render in the future should provide a cure rather than a mere
palliative. Any government that is
willing to assist in the task of recovery will find full co-
operation I am sure, on the part of the
United States Government. Any government which maneuvers
to block the recovery of other
countries cannot expect help from us. Furthermore,
governments, political parties, or groups
which seek to perpetuate human misery in order to profit
therefrom politically or otherwise will
encounter the opposition of the United States.
It is already evident that, before the United States Government
can proceed much further in its
efforts to alleviate the situation and help start the European
world on its way to recovery, there
must be some agreement among the countries of Europe as to
the requirements of the situation
and the part those countries themselves will take in order to
give proper effect to whatever action
might be undertaken by this Government. It would be neither
fitting nor efficacious for this
Government to undertake to draw up unilaterally a program
designed to place Europe on its feet
economically. This is the business of the Europeans. The
initiative, I think, must come from
Europe. The role of this country should consist of friendly aid
in the drafting of a European
program and of later support of such a program so far as it may
be practical for us to do so. The
program should be a joint one, agreed to by a number, if not all
European nations.
An essential part of any successful action on the part of the
United States is an understanding on
the part of the people of America of the character of the
problem and the remedies to be applied.
Political passion and prejudice should have no part. With
foresight, and a willingness on the part
of our people to face up to the vast responsibility which history
has clearly placed upon our
country, the difficulties I have outlined can and will be
overcome.
I am sorry that on each occasion I have said something publicly
in regard to our international
situation, I've been forced by the necessities of the case to enter
into rather technical discussions.
But to my mind, it is of vast importance that our people reach
some general understanding of
what the complications really are, rather than react from a
passion or a prejudice or an emotion
of the moment. As I said more formally a moment ago, we are
remote from the scene of these
troubles. It is virtually impossible at this distance merely by
reading, or listening, or even seeing
photographs or motion pictures, to grasp at all the real
significance of the situation. And yet the
whole world of the future hangs on a proper judgment. It hangs,
I think, to a large extent on the
realization of the American people, of just what are the various
dominant factors. What are the
reactions of the people? What are the justifications of those
reactions? What are the sufferings?
What is needed? What can best be done? What must be done?
Thank you very much.
6
Nikita Khrushchev (1894-1971)
Secret Speech to the Closed Session of the Twentieth Party
Congress, February 25, 1956
Excerpts
We have to consider seriously and analyze correctly [the crimes
of the Stalin era] in order that
we may preclude any possibility of a repetition in any form
whatever of what took place during
the life of Stalin, who absolutely did not tolerate collegiality in
leadership and in work, and who
practiced brutal violence, not only toward everything which
opposed him, but also toward that
which seemed to his capricious and despotic character, contrary
to his concepts.
Stalin acted not through persuasion, explanation, and patient
cooperation with people, but by
imposing his concepts and demanding absolute submission to
his opinion. Whoever opposed this
concept or tried to prove hi viewpoint, and the correctness of
his position, was doomed to
removal from the leading collective and to subsequent moral
and physical annihilation. This was
especially true during the period following the XVIIth Party
Congress (1934)....
Stalin originated the concept enemy of the people. This term
automatically rendered it
unnecessary that the ideological errors of a man or men engaged
in a controversy be proven; t his
term made possible the usage of the most cruel repression,
violating all norms of revolutionary
legality, against anyone who in any way disagreed with Stalin,
against those who were only
suspected of hostile intent, against those who had bad
reputations. This concept, enemy of the
people, actually eliminated the possibility of any kind of
ideological fight or the making of one's
views known on this or that issue, even those of a practical
character.... The only proof of guilt
used, against all norms of current legal science, was the
confession of the accused himself; and,
as subsequent probing proved, confessions were acquired
through physical pressures against the
accused.
This led to the glaring violations of revolutionary legality, and
to the fact that many entirely
innocent persons, who in the past had defended the Party line,
became victims....
The Commission [of Inquiry] has become acquainted with a
large quantity of materials in the
NKVD archives…. It became apparent that many Party, Soviet
and economic activists who were
branded in 1937-1938 as enemies were actually never enemies,
spies, wreckers, etc., but were
always honest Communists; they were only so stigmatized, and
often, no longer able to bear
barbaric tortures, they charged themselves with all kinds of
grave and unlikely crimes....
Lenin used severe methods only in the most necessary cases,
when the exploiting classes were
still in existence and were vigorously opposing the revolution,
when the struggle for survival was
decidedly assuming the sharpest forms, even including a civil
war.
Stalin, on the other hand, used extreme methods and mass
repression at a time when the
revolution was already victorious, when the Soviet state was
strengthened, when the exploiting
classes were already liquidated and Socialist relations were
rooted solidly in all phases of
national economy, when our Party was politically consolidated
and had strengthened itself both
numerically and ideologically. It is clear that here Stalin
showed in a whole series of cases his
intolerance, his brutality and his abuse of power. Instead of
proving his political correctness and
7
mobilizing the masses, he often chose the path of repression and
physical annihilation, not only
against actual enemies, but also against individuals who had not
committed any crimes against
the Party and the Soviet government....
Sixteen Political, Economic, and Ideological Points, Budapest,
October 22,
1956
RESOLUTION ADOPTED AT PLENARY MEETING OF THE
BUILDING INDUSTRY
TECHNOLOGY UNIVERSITY'
Students of Budapest! The following resolution was born on 22
October 1956, at the dawn of a
new period in Hungarian history, in the Hall of the Building
Industry Technological University
as a result of the spontaneous movement of several thousand of
the Hungarian youth who love
their Fatherland:(1) We demand the immediate withdrawal of all
Soviet troops in accordance
with the provisions of the Peace Treaty.(2) We demand the
election of new leaders in the
Hungarian Workers' Party on the low, medium and high levels
by secret ballot from the ranks
upwards. These leaders should convene the Party Congress
within the shortest possible time and
should elect a new central body of leaders.(3) The Government
should be reconstituted under the
leadership of Comrade Imre Nagy; all criminal leaders of the
Stalinist-Rdkosi era should be
relieved of their posts at once.(4) We demand a public trial in
the criminal case of Milidly Farkas
and his accomplices. Mdty-ds Rdkosi, who is primarily
responsible for all the crimes of the
recent past and for the ruin of this country, should be brought
home and brought before a
People's Court of judgment.(5) We demand general elections in
this country, with universal
suffrage, secret ballot and the participation of several Parties
for the purpose of electing a new
National Assembly. We demand that the workers should have
the right to strike.(6) We demand
a re-examination and re-adjustment of Hungarian-Soviet and
Hungarian-Yugoslav political,
economic and intellectual relations on the basis of complete
political and economic equality and
of non~intervention in each other's internal affairs.(7) We
demand the re-organization of the
entire economic life of Hungary, with the assistance of
specialists. Our whole economic system
based on planned economy should be re-examined with an eve
to Hungarian conditions and to
the vital interests of the Hungarian people.(8) Our foreign trade
agreements and the real figures
in respect of reparations that can never be paid should be made
public. We demand frank and
sincere information concerning the country's uranium deposits,
their exploitation and the Russian
concession. We demand that Hungary should have the right to
sell the uranium ore freely at
world market prices in exchange for hard currency.(9) We
demand the complete revision of
norms in industry and an urgent and radical adjustment of wages
to meet the demands of workers
and intellectuals. We demand that minimum living wages for
workers should be fixed.(10) We
demand that the delivery system should be placed on a new
basis and that produce should be
used rationally. We demand equal treat ment of peasants
farming individually.(11) We demand
the re-examination of all political and economic trials by
independent courts and the release and
rehabilitation of innocent persons. We demand the immediate
repatriation of prisoners-of-war
and of civilians deported to the Soviet Union, including
prisoners who have been condemned
beyond the frontiers of Hungary.(12) We demand complete
freedom of opinion and expression,
freedom of the Press and a free Radio, as well as a new daily
newspaper of large circulation for
8
the MEFESZ [League of Hungarian University and College
Student Associations] organization.
We demand that the existing 'screening material' should be
made public and destroyed.(13) We
demand that the Stalin statue-the symbol of Stalinist tyranny
and political oppression-should be
removed as quickly as possible and that a memorial worthy of
the freedom fighters and martyrs
of 1848-49 should be erected on its site.(14) In place of the
existing coat of arms, which is
foreign to the Hungarian people, we wish the re-introduction of
the old Hungarian Kossuth arms.
We demand for the Hungarian Army new uniforms worthy of
our national traditions. We
demand that 15 March should be a national holiday and a
non~working day and that 6 October
should be a day of national mourning and a school holiday..(15)
The youth of the Technological
University of Budapest unanimously express their complete
solidarity with the Polish and
Warsaw workers and youth in connection with the Polish
national independence movement.(16)
The students of the Building Industry Technological University
will organize local units of
MEFESZ as quickly as possible, and have resolved to convene a
Youth Parliament in Budapest
for the 27th of this month (Saturday) at which the entire youth
of this country will be represented
by their delegates. The students of the Technological University
and of the various other
Universities will gather in the Gorkij Fasor before the Writers'
Union Headquarters tomorrow,
the 23rd. of this month, at 2.30 P.m., whence they will proceed
to the Pálffy Tér (Bern Ter) to
the Bern statue, on which they will lay wreaths in sign of their
sympathy with the Polish freedom
movement. The workers of the factories are invited to join in
this procession.
Treaty of ROME, 1957
Treaty establishing the European Economic Community
HIS MAJESTY THE KING OF THE BELGIANS, THE
PRESIDENT OF THE FEDERAL
REPUBLIC OF GERMANY, THE PRESIDENT OF THE
FRENCH REPUBLIC, THE
PRESIDENT OF THE ITALIAN REPUBLIC, HER ROYAL
HIGHNESS THE GRAND
DUCHESS OF LUXEMBOURG, HER MAJESTY THE QUEEN
OF THE NETHERLANDS,
DETERMINED to establish the foundations of an ever closer
union among the European
peoples,
DECIDED to ensure the economic and social progress of their
countries by common action in
eliminating the barriers which divide Europe,
DIRECTING their efforts to the essential purpose of constantly
improving the living and
working conditions of their peoples,
RECOGNISING that the removal of existing obstacles calls for
concerted action in order to
guarantee a steady expansion, a balanced trade and fair
competition,
ANXIOUS to strengthen the unity of their economies and to
ensure their harmonious
development by reducing the differences existing between the
various regions and by mitigating
the backwardness of the less favoured,
DESIROUS of contributing by means of a common commercial
policy to the progressive
abolition of restrictions on international trade,
9
INTENDING to confirm the solidarity which binds Europe and
overseas countries, and desiring
to ensure the development of their prosperity, in accordance
with the principles of the Charter of
the United Nations,
RESOLVED to strengthen the safeguards of peace and liberty
by establishing this combination
of resources, and calling upon the other peoples of Europe who
share their ideal to join in their
efforts,
HAVE DECIDED to create a European Economic Community
and to this end have designated
as their plenipotentiaries:
HIS MAJESTY THE KING OF THE BELGIANS:
Mr. Paul-Henri SPAAK, Minister of Foreign Affairs, Baron J.
Ch. SNOY and D’OPPUERS,
Secretary-General of the Ministry of Economic Affairs, Head of
the Belgian delegation to the
Intergovernmental Conference;
THE PRESIDENT OF THE FEDERAL REPUBLIC OF
GERMANY:
Dr. Konrad ADENAUER, Federal Chancellor, Professor Dr.
Walter HALLSTEIN, State
Secretary of the Federal Foreign Office;
THE PRESIDENT OF THE FRENCH REPUBLIC:
Mr. Christian PINEAU, Minister of Foreign Affairs, Mr.
Maurice FAURE, Under-Secretary of
State for Foreign Affairs;
THE PRESIDENT OF THE ITALIAN REPUBLIC:
Mr. Antonio SEGNI, President of the Council of Ministers,
Professor Gaetano MARTINO,
Ministers of Foreign Affairs;
HER ROYAL HIGHNESS THE GRAND DUCHESS OF
LUXEMBOURG:
Mr. Joseph BECH, Prime Minister, Minister of Foreign Affairs,
Mr. Lambert SCHAUS,
Ambassador, Head of the Luxembourg delegation to the
Intergovernmental Conference;
HER MAJESTY THE QUEEN OF THE NETHERLANDS:
Mr. Joseph LUNS, Minister of Foreign Affairs, Mr. J.
LINTHORST HOMAN, Head of the
Netherlands delegation to the Intergovernmental Conference;
WHO, having exchanged their full powers, found in good and
due form, have agreed, as follows:
PART ONE — Principles[edit]
Article 1
By the present Treaty, the HIGH CONTRACTING PARTIES
establish among themselves a
EUROPEAN ECONOMIC COMMUNITY.
Article 2
It shall be the aim of the Community, by establishing a Common
Market and progressively
approximating the economic policies of Member States, to
promote throughout the Community a
harmonious development of economic activities, a continuous
and balanced expansion, an
https://en.wikisource.org/w/index.php?title=Treaty_establishing
_the_European_Economic_Community&action=edit&section=1
10
increased stability, an accelerated raising of the standard of
living and closer relations between
its Member States.
Article 3
For the purposes set out in the preceding Article, the activities
of the Community shall include,
under the conditions and with the timing provided for in this
Treaty:
(a) the elimination, as between Member States, of customs
duties and of quantitative restrictions
in regard to the importation and exportation of goods, as well as
of all other measures with
equivalent effect;
(b) the establishment of a common customs tariff and a common
commercial policy towards
third countries;
(c) the abolition, as between Member States, of the obstacles to
the free movement of persons,
services and capital;
(d) the inauguration of a common agricultural policy;
(e) the inauguration of a common transport policy;
(f) the establishment of a system ensuring that competition shall
not be distorted in the Common
Market;
(g) the application of procedures which shall make it possible to
co-ordinate the economic
policies of Member States and to remedy disequilibria in their
balances of payments;
(h) the approximation of their respective municipal law to the
extent necessary for the
functioning of the Common Market;
(i) the creation of a European Social Fund in order to improve
the possibilities of employment
for workers and to contribute to the raising of their standard of
living;
(j) the establishment of a European Investment Bank intended to
facilitate the economic
expansion of the Community through the creation of new
resources; and
(k) the association of overseas countries and territories with the
Community with a view to
increasing trade and to pursuing jointly their effort towards
economic and social development.
Article 4
1. The achievement of the tasks entrusted to the Community
shall be ensured by
• an ASSEMBLY,
• a COUNCIL,
• a COMMISSION, and
• a COURT OF JUSTICE.
Each of these institutions shall act within the limits of the
powers conferred upon it by this
Treaty…
Article 6
11
1. Member States, acting in close collaboration with the
institutions of the Community, shall co -
ordinate their respective economic policies to the extent that is
necessary to attain the objectives
of this Treaty….
Article 7
Within the …
' '
I '
__ ,_· .... ~::--
PANG-NING TAN
Michigan State University
MICHAEL STEINBACH
Un iversity of Minnesota
VIPIN KUMAR
Univers i ty of Minnesota
and Army High Performance
Comput ing Research Center
~
TT
• . Boston S;m Fr.mcisco New York
London Toronto Sydney Tokyo Singapore Madrid
Mexico Cicy Munich Paris Cape Town Hong Kong Montreal
Contents
Preface vii
1 Introduction 1
1.1 What Is Data Mining? . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivating Challenges . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The Origins of Data Mining . . . . . . . . . . . . . . . . . . . . 6
1.4 Data Mining Tasks . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Scope and Organization of the Book . . . . . . . . . . . . . . . 11
1.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Data 19
2.1 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.1 Attributes and Measurement . . . . . . . . . . . . . . . 23
2.1.2 Types of Data Sets . . . . . . . . . . . . . . . . . . . . . 29
2.2 Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.1 Measurement and Data Collection Issues . . . . . . . . . 37
2.2.2 Issues Related to Applications . . . . . . . . . . . . . . 43
2.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3.1 Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3.3 Dimensionality Reduction . . . . . . . . . . . . . . . . . 50
2.3.4 Feature Subset Selection . . . . . . . . . . . . . . . . . . 52
2.3.5 Feature Creation . . . . . . . . . . . . . . . . . . . . . . 55
2.3.6 Discretization and Binarization . . . . . . . . . . . . . . 57
2.3.7 Variable Transformation . . . . . . . . . . . . . . . . . . 63
2.4 Measures of Similarity and Dissimilarity . . . . . . . . . . . . .
65
2.4.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.4.2 Similarity and Dissimilarity between Simple Attributes .
67
2.4.3 Dissimilarities between Data Objects . . . . . . . . . . . 69
2.4.4 Similarities between Data Objects . . . . . . . . . . . . 72
xiv Contents
2.4.5 Examples of Proximity Measures . . . . . . . . . . . . . 73
2.4.6 Issues in Proximity Calculation . . . . . . . . . . . . . . 80
2.4.7 Selecting the Right Proximity Measure . . . . . . . . . . 83
2.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3 Exploring Data 97
3.1 The Iris Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.2 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.2.1 Frequencies and the Mode . . . . . . . . . . . . . . . . . 99
3.2.2 Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.2.3 Measures of Location: Mean and Median . . . . . . . . 101
3.2.4 Measures of Spread: Range and Variance . . . . . . . . 102
3.2.5 Multivariate Summary Statistics . . . . . . . . . . . . . 104
3.2.6 Other Ways to Summarize the Data . . . . . . . . . . . 105
3.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.3.1 Motivations for Visualization . . . . . . . . . . . . . . . 105
3.3.2 General Concepts . . . . . . . . . . . . . . . . . . . . . . 106
3.3.3 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.3.4 Visualizing Higher-Dimensional Data . . . . . . . . . . . 124
3.3.5 Do’s and Don’ts . . . . . . . . . . . . . . . . . . . . . . 130
3.4 OLAP and Multidimensional Data Analysis . . . . . . . . . . .
131
3.4.1 Representing Iris Data as a Multidimensional Array . . 131
3.4.2 Multidimensional Data: The General Case . . . . . . . . 133
3.4.3 Analyzing Multidimensional Data . . . . . . . . . . . . 135
3.4.4 Final Comments on Multidimensional Data Analysis . .
139
3.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4 Classification:
Basic Concepts, Decision Trees, and Model Evaluation 145
4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.2 General Approach to Solving a Classification Problem . . . .
. 148
4.3 Decision Tree Induction . . . . . . . . . . . . . . . . . . . . . . 150
4.3.1 How a Decision Tree Works . . . . . . . . . . . . . . . . 150
4.3.2 How to Build a Decision Tree . . . . . . . . . . . . . . . 151
4.3.3 Methods for Expressing Attribute Test Conditions . . . 155
4.3.4 Measures for Selecting the Best Split . . . . . . . . . . . 158
4.3.5 Algorithm for Decision Tree Induction . . . . . . . . . . 164
4.3.6 An Example: Web Robot Detection . . . . . . . . . . . 166
Contents xv
4.3.7 Characteristics of Decision Tree Induction . . . . . . . . 168
4.4 Model Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . 172
4.4.1 Overfitting Due to Presence of Noise . . . . . . . . . . . 175
4.4.2 Overfitting Due to Lack of Representative Samples . . .
177
4.4.3 Overfitting and the Multiple Comparison Procedure . . 178
4.4.4 Estimation of Generalization Errors . . . . . . . . . . . 179
4.4.5 Handling Overfitting in Decision Tree Induction . . . . 184
4.5 Evaluating the Performance of a Classifier . . . . . . . . . . . .
186
4.5.1 Holdout Method . . . . . . . . . . . . . . . . . . . . . . 186
4.5.2 Random Subsampling . . . . . . . . . . . . . . . . . . . 187
4.5.3 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . 187
4.5.4 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . 188
4.6 Methods for Comparing Classifiers . . . . . . . . . . . . . . . .
188
4.6.1 Estimating a Confidence Interval for Accuracy . . . . . 189
4.6.2 Comparing the Performance of Two Models . . . . . . . 191
4.6.3 Comparing the Performance of Two Classifiers . . . . . 192
4.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 193
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
5 Classification: Alternative Techniques 207
5.1 Rule-Based Classifier . . . . . . . . . . . . . . . . . . . . . . . . 207
5.1.1 How a Rule-Based Classifier Works . . . . . . . . . . . . 209
5.1.2 Rule-Ordering Schemes . . . . . . . . . . . . . . . . . . 211
5.1.3 How to Build a Rule-Based Classifier . . . . . . . . . . . 212
5.1.4 Direct Methods for Rule Extraction . . . . . . . . . . . 213
5.1.5 Indirect Methods for Rule Extraction . . . . . . . . . . 221
5.1.6 Characteristics of Rule-Based Classifiers . . . . . . . . . 223
5.2 Nearest-Neighbor classifiers . . . . . . . . . . . . . . . . . . . . 223
5.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 225
5.2.2 Characteristics of Nearest-Neighbor Classifiers . . . . . 226
5.3 Bayesian Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . 227
5.3.1 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . 228
5.3.2 Using the Bayes Theorem for Classification . . . . . . . 229
5.3.3 Näıve Bayes Classifier . . . . . . . . . . . . . . . . . . . 231
5.3.4 Bayes Error Rate . . . . . . . . . . . . . . . . . . . . . . 238
5.3.5 Bayesian Belief Networks . . . . . . . . . . . . . . . . . 240
5.4 Artificial Neural Network (ANN) . . . . . . . . . . . . . . . . . 246
5.4.1 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . 247
5.4.2 Multilayer Artificial Neural Network . . . . . . . . . . . 251
5.4.3 Characteristics of ANN . . . . . . . . . . . . . . . . . . 255
xvi Contents
5.5 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . 256
5.5.1 Maximum Margin Hyperplanes . . . . . . . . . . . . . . 256
5.5.2 Linear SVM: Separable Case . . . . . . . . . . . . . . . 259
5.5.3 Linear SVM: Nonseparable Case . . . . . . . . . . . . . 266
5.5.4 Nonlinear SVM . . . . . . . . . . . . . . . . . . . . . . . 270
5.5.5 Characteristics of SVM . . . . . . . . . . . . . . . . . . 276
5.6 Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . . 276
5.6.1 Rationale for Ensemble Method . . . . . . . . . . . . . . 277
5.6.2 Methods for Constructing an Ensemble Classifier . . . .
278
5.6.3 Bias-Variance Decomposition . . . . . . . . . . . . . . . 281
5.6.4 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
5.6.5 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
5.6.6 Random Forests . . . . . . . . . . . . . . . . . . . . . . 290
5.6.7 Empirical Comparison among Ensemble Methods . . . . 294
5.7 Class Imbalance Problem . . . . . . . . . . . . . . . . . . . . . 294
5.7.1 Alternative Metrics . . . . . . . . . . . . . . . . . . . . . 295
5.7.2 The Receiver Operating Characteristic Curve . . . . . . 298
5.7.3 Cost-Sensitive Learning . . . . . . . . . . . . . . . . . . 302
5.7.4 Sampling-Based Approaches . . . . . . . . . . . . . . . . 305
5.8 Multiclass Problem . . . . . . . . . . . . . . . . . . . . . . . . . 306
5.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 309
5.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
6 Association Analysis: Basic Concepts and Algorithms 327
6.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . 328
6.2 Frequent Itemset Generation . . . . . . . . . . . . . . . . . . . 332
6.2.1 The Apriori Principle . . . . . . . . . . . . . . . . . . . 333
6.2.2 Frequent Itemset Generation in the Apriori Algorithm .
335
6.2.3 Candidate Generation and Pruning . . . . . . . . . . . . 338
6.2.4 Support Counting . . . . . . . . . . . . . . . . . . . . . 342
6.2.5 Computational Complexity . . . . . . . . . . . . . . . . 345
6.3 Rule Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 349
6.3.1 Confidence-Based Pruning . . . . . . . . . . . . . . . . . 350
6.3.2 Rule Generation in Apriori Algorithm . . . . . . . . . . 350
6.3.3 An Example: Congressional Voting Records . . . . . . . 352
6.4 Compact Representation of Frequent Itemsets . . . . . . . . . .
353
6.4.1 Maximal Frequent Itemsets . . . . . . . . . . . . . . . . 354
6.4.2 Closed Frequent Itemsets . . . . . . . . . . . . . . . . . 355
6.5 Alternative Methods for Generating Frequent Itemsets . . . .
. 359
6.6 FP-Growth Algorithm . . . . . . . . . . . . . . . . . . . . . . . 363
Contents xvii
6.6.1 FP-Tree Representation . . . . . . . . . . . . . . . . . . 363
6.6.2 Frequent Itemset Generation in FP-Growth Algorithm .
366
6.7 Evaluation of Association Patterns . . . . . . . . . . . . . . . . 370
6.7.1 Objective Measures of Interestingness . . . . . . . . . . 371
6.7.2 Measures beyond Pairs of Binary Variables . . . . . . . 382
6.7.3 Simpson’s Paradox . . . . . . . . . . . . . . . . . . . . . 384
6.8 Effect of Skewed Support Distribution . . . . . . . . . . . . . .
386
6.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 390
6.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
7 Association Analysis: Advanced Concepts 415
7.1 Handling Categorical Attributes . . . . . . . . . . . . . . . . . 415
7.2 Handling Continuous Attributes . . . . . . . . . . . . . . . . . 418
7.2.1 Discretization-Based Methods . . . . . . . . . . . . . . . 418
7.2.2 Statistics-Based Methods . . . . . . . . . . . . . . . . . 422
7.2.3 Non-discretization Methods . . . . . . . . . . . . . . . . 424
7.3 Handling a Concept Hierarchy . . . . . . . . . . . . . . . . . . 426
7.4 Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 429
7.4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . 429
7.4.2 Sequential Pattern Discovery . . . . . . . . . . . . . . . 431
7.4.3 Timing Constraints . . . . . . . . . . . . . . . . . . . . . 436
7.4.4 Alternative Counting Schemes . . . . . . . . . . . . . . 439
7.5 Subgraph Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 442
7.5.1 Graphs and Subgraphs . . . . . . . . . . . . . . . . . . . 443
7.5.2 Frequent Subgraph Mining . . . . . . . . . . . . . . . . 444
7.5.3 Apriori -like Method . . . . . . . . . . . . . . . . . . . . 447
7.5.4 Candidate Generation . . . . . . . . . . . . . . . . . . . 448
7.5.5 Candidate Pruning . . . . . . . . . . . . . . . . . . . . . 453
7.5.6 Support Counting . . . . . . . . . . . . . . . . . . . . . 457
7.6 Infrequent Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 457
7.6.1 Negative Patterns . . . . . . . . . . . . . . . . . . . . . 458
7.6.2 Negatively Correlated Patterns . . . . . . . . . . . . . . 458
7.6.3 Comparisons among Infrequent Patterns, Negative Pat-
terns, and Negatively Correlated Patterns . . . . . . . . 460
7.6.4 Techniques for Mining Interesting Infrequent Patterns .
461
7.6.5 Techniques Based on Mining Negative Patterns . . . . . 463
7.6.6 Techniques Based on Support Expectation . . . . . . . . 465
7.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 469
7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
xviii Contents
8 Cluster Analysis: Basic Concepts and Algorithms 487
8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
8.1.1 What Is Cluster Analysis? . . . . . . . . . . . . . . . . . 490
8.1.2 Different Types of Clusterings . . . . . . . . . . . . . . . 491
8.1.3 Different Types of Clusters . . . . . . . . . . . . . . . . 493
8.2 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
8.2.1 The Basic K-means Algorithm . . . . . . . . . . . . . . 497
8.2.2 K-means: Additional Issues . . . . . . . . . . . . . . . . 506
8.2.3 Bisecting K-means . . . . . . . . . . . . . . . . . . . . . 508
8.2.4 K-means and Different Types of Clusters . . . . . . . . 510
8.2.5 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 510
8.2.6 K-means as an Optimization Problem . . . . . . . . . . 513
8.3 Agglomerative Hierarchical Clustering . . . . . . . . . . . . . .
515
8.3.1 Basic Agglomerative Hierarchical Clustering Algorithm
516
8.3.2 Specific Techniques . . . . . . . . . . . . . . . . . . . . . 518
8.3.3 The Lance-Williams Formula for Cluster Proximity . . .
524
8.3.4 Key Issues in Hierarchical Clustering . . . . . . . . . . . 524
8.3.5 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 526
8.4 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
8.4.1 Traditional Density: Center-Based Approach . . . . . . 527
8.4.2 The DBSCAN Algorithm . . . . . . . . . . . . . . . . . 528
8.4.3 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 530
8.5 Cluster Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 532
8.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 533
8.5.2 Unsupervised Cluster Evaluation Using Cohesion and
Separation . . . . . . . . . . . . . . . . . . . . . . . . . 536
8.5.3 Unsupervised Cluster Evaluation Using the Proximity
Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
8.5.4 Unsupervised Evaluation of Hierarchical Clustering . . .
544
8.5.5 Determining the Correct Number of Clusters . . . . . . 546
8.5.6 Clustering Tendency . . . . . . . . . . . . . . . . . . . . 547
8.5.7 Supervised Measures of Cluster Validity . . . . . . . . . 548
8.5.8 Assessing the Significance of Cluster Validity Measures .
553
8.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 555
8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
9 Cluster Analysis: Additional Issues and Algorithms 569
9.1 Characteristics of Data, Clusters, and Clustering Algorithms
. 570
9.1.1 Example: Comparing K-means and DBSCAN . . . . . . 570
9.1.2 Data Characteristics . . . . . . . . . . . . . . . . . . . . 571
Contents xix
9.1.3 Cluster Characteristics . . . . . . . . . . . . . . . . . . . 573
9.1.4 General Characteristics of Clustering Algorithms . . . . 575
9.2 Prototype-Based Clustering . . . . . . . . . . . . . . . . . . . . 577
9.2.1 Fuzzy Clustering . . . . . . . . . . . . . . . . . . . . . . 577
9.2.2 Clustering Using Mixture Models . . . . . . . . . . . . . 583
9.2.3 Self-Organizing Maps (SOM) . . . . . . . . . . . . . . . 594
9.3 Density-Based Clustering . . . . . . . . . . . . . . . . . . . . . 600
9.3.1 Grid-Based Clustering . . . . . . . . . . . . . . . . . . . 601
9.3.2 Subspace Clustering . . . . . . . . . . . . . . . . . . . . 604
9.3.3 DENCLUE: A Kernel-Based Scheme for Density-Based
Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 608
9.4 Graph-Based Clustering . . . . . . . . . . . . . . . . . . . . . . 612
9.4.1 Sparsification . . . . . . . . . . . . . . . . . . . . . . . . 613
9.4.2 Minimum Spanning Tree (MST) Clustering . . . . . . . 614
9.4.3 OPOSSUM: Optimal Partitioning of Sparse Similarities
Using METIS . . . . . . . . . . . . . . . . . . . . . . . . 616
9.4.4 Chameleon: Hierarchical Clustering with Dynamic
Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 616
9.4.5 Shared Nearest Neighbor Similarity . . . . . . . . . . . 622
9.4.6 The Jarvis-Patrick Clustering Algorithm . . . . . . . . . 625
9.4.7 SNN Density . . . . . . . . . . . . . . . . . . . . . . . . 627
9.4.8 SNN Density-Based Clustering . . . . . . . . . . . . . . 629
9.5 Scalable Clustering Algorithms . . . . . . . . . . . . . . . . . . 630
9.5.1 Scalability: General Issues and Approaches . . . . . . . 630
9.5.2 BIRCH . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
9.5.3 CURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
9.6 Which Clustering Algorithm? . . . . . . . . . . . . . . . . . . . 639
9.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 643
9.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
10 Anomaly Detection 651
10.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
10.1.1 Causes of Anomalies . . . . . . . . . . . . . . . . . . . . 653
10.1.2 Approaches to Anomaly Detection . . . . . . . . . . . . 654
10.1.3 The Use of Class Labels . . . . . . . . . . . . . . . . . . 655
10.1.4 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
10.2 Statistical Approaches . . . . . . . . . . . . . . . . . . . . . . . 658
10.2.1 Detecting Outliers in a Univariate Normal Distribution
659
10.2.2 Outliers in a Multivariate Normal Distribution . . . . . 661
10.2.3 A Mixture Model Approach for Anomaly Detection . . .
662
xx Contents
10.2.4 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 665
10.3 Proximity-Based Outlier Detection . . . . . . . . . . . . . . . .
666
10.3.1 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 666
10.4 Density-Based Outlier Detection . . . . . . . . . . . . . . . . .
668
10.4.1 Detection of Outliers Using Relative Density . . . . . . 669
10.4.2 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 670
10.5 Clustering-Based Techniques . . . . . . . . . . . . . . . . . . . 671
10.5.1 Assessing the Extent to Which an Object Belongs to a
Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
10.5.2 Impact of Outliers on the Initial Clustering . . . . . . . 674
10.5.3 The Number of Clusters to Use . . . . . . . . . . . . . . 674
10.5.4 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 674
10.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 675
10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
Appendix A Linear Algebra 685
A.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685
A.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 685
A.1.2 Vector Addition and Multiplication by a Scalar . . . . . 685
A.1.3 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . 687
A.1.4 The Dot Product, Orthogonality, and Orthogonal
Projections . . . . . . . . . . . . . . . . . . . . . . . . . 688
A.1.5 Vectors and Data Analysis . . . . . . . . . . . . . . . . 690
A.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
A.2.1 Matrices: Definitions . . . . . . . . . . . . . . . . . . . . 691
A.2.2 Matrices: Addition and Multiplication by a Scalar . . . 692
A.2.3 Matrices: Multiplication . . . . . . . . . . . . . . . . . . 693
A.2.4 Linear Transformations and Inverse Matrices . . . . . . 695
A.2.5 Eigenvalue and Singular Value Decomposition . . . . . .
697
A.2.6 Matrices and Data Analysis . . . . . . . . . . . . . . . . 699
A.3 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 700
Appendix B Dimensionality Reduction 701
B.1 PCA and SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
B.1.1 Principal Components Analysis (PCA) . . . . . . . . . . 701
B.1.2 SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
B.2 Other Dimensionality Reduction Techniques . . . . . . . . . . .
708
B.2.1 Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . 708
B.2.2 Locally Linear Embedding (LLE) . . . . . . . . . . . . . 710
B.2.3 Multidimensional Scaling, FastMap, and ISOMAP . . . 712
Contents xxi
B.2.4 Common Issues . . . . . . . . . . . . . . . . . . . . . . . 715
B.3 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 716
Appendix C Probability and Statistics 719
C.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
C.1.1 Expected Values . . . . . . . . . . . . . . . . . . . . . . 722
C.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
C.2.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . 724
C.2.2 Central Limit Theorem . . . . . . . . . . . . . . . . . . 724
C.2.3 Interval Estimation . . . . . . . . . . . . . . . . . . . . . 725
C.3 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . 726
Appendix D Regression 729
D.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
D.2 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . 730
D.2.1 Least Square Method . . . . . . . . . . . . . . . . . . . 731
D.2.2 Analyzing Regression Errors . . . . . . . . . . . . . . . 733
D.2.3 Analyzing Goodness of Fit . . . . . . . . . . . . . . . . 735
D.3 Multivariate Linear Regression . . . . . . . . . . . . . . . . . . 736
D.4 Alternative Least-Square Regression Methods . . . . . . . . . .
737
Appendix E Optimization 739
E.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . 739
E.1.1 Numerical Methods . . . . . . . . . . . . . . . . . . . . 742
E.2 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . 746
E.2.1 Equality Constraints . . . . . . . . . . . . . . . . . . . . 746
E.2.2 Inequality Constraints . . . . . . . . . . . . . . . . . . . 747
Author Index 750
Subject Index 758
Copyright Permissions 769
1
I htrod uctiori
Rapid advances in data collection and storage technology have
enabled or-
ganizations to accumulate vast amounts of data. However,
extracting useful
information has proven extremely challenging. Often,
traditional data analy-
sis tools and techniques cannot be used because of the massive
size of a data
set. Sometimes, the non-traditional nature of the data means that
traditional
approaches cannot be applied even if the data set is relatively
small. In other
situations, the questions that need to be answered cannot be
addressed using
existing data analysis techniques, and thus, new methods need
to be devel-
oped.
Data mining is a technology that blends traditional data analysis
methods
with sophisticated algorithms for processing large volumes of
data. It has also
opened up exciting opportunities for exploring and analyzing
new types of
data and for analyzing old types of data in new ways. In this
introductory
chapter, we present an overview of data mining and outline the
key topics
to be covered in this book. We start with a descript ion of some
well-known
applications that require new techniques for data analysis.
Business Point-of-sale data collection (bar code scanners, radio
frequency
identification (RFID), and smart card technology) have allowed
retailers to
collect up-to-the-minute data about customer purchases at the
checkout coun-
ters of their stores. Retailers can utilize this information, along
with other
business-critical data such as Web logs from e-commerce Web
sites and cus-
tomer service records from call centers, to help them better
understand the
needs of their customers and make more informed business
decisions.
Data mining techniques can be used to support a wide range of
business
intelligence applications such as customer profiling, targeted
marketing, work-
flow management, store layout , and fraud detection. It can also
help retailers
..... -~-::....o:.·_-:---"
2 Chapter 1 Introduction
answer important business questions such as "Who are the most
profitable
customers?" "What products can be cross-sold or up-sold?" and
"What is the
revenue outlook of the company for next year?" Some of these
questions mo-
tivated the creation of association analysis (Chapters 6 and 7) ,
a new data
analysis technique.
Medicine, Science, and Engineering Researchers in medicine,
science,
and engineering are rapidly accumulating data that is key to
important new
discoveries. For example, as an important step toward
improving our under-
standing of the Earth's climate system, NASA has deployed a
series of Earth-
orbiting satellites that continuously generate global
observations of the land
surface, oceans, and atmosphere. However, because of the size
and spatia-
temporal nature of the data, tradit ional methods are often not
suitable for
analyzing these data sets. Techniques developed in data mining
can aid Earth
scientists in answering questions such as "What is the
relationship between
the frequency and intensity of ecosystem disturbances such as
drougllts and
hurricanes to global warming?" "How is land surface
precipitation and temper-
ature affected by ocean surface temperature?" and "How well
can we predict
the beginning and end of the growing season for a region?"
As another example, researchers in molecular biology hope to
use the large
amounts of genomic data currently being gathered to better
understand the
structure and function of genes. In the past, traditional methods
in molecu-
lar biology allowed scientists to study only a few genes at a
time in a given
experiment. Recent breakthroughs in microarray technology
have enabled sci-
entists to compare the behavior of thousands of genes under
various situations.
Such comparisons can help determine the function of each gene
and perhaps
isolate the genes responsible for certain diseases. However, the
noisy and high-
dimensional nature of data requires new types of data analysis.
In addition
to analyzing gene array data, data mining can also be used to
address other
important biological challenges such as protein structure
prediction, multiple
sequence alignment, the modeling of biochemical pathways, and
phylogenetics.
1.1 What Is Data Mining?
Data mining is the process of automatically discovering useful
information in
large data repositories. Data mining techniques are deployed to
scour large
databases in order to find novel and useful patterns that might
otherwise
remai n unknown. They also provide capabili ties to predict t.he
outcome of a
1.1 What Is Data Min ing? 3
future observation, such as predicting whether a newly arrived
customer will
spend more t han $100 at a department store.
Not all information d iscovery tasks are considered to be data
mining . For
example, looking up ind ividual records using a database
management system
or finding particular Web pages via a query to an Internet
search engine are
tasks related to the area of information r etr ieval. Although
such tasks are
important and may involve the use of the sophisticated
algorithms and data
structures, t hey rely on traditional computer science techniques
and obvious
feat ures of the data to create index structures for efficiently
organizing and
retrieving information. Nonetheless, data mining techniques
have been used
to enhance information retrieval systems.
Data M ining and Knowledge Discovery
Data mining is an integral part of knowledge d iscovery in
databases
(KDD), which is t he overall process of convert ing raw data
into useful in-
formation, as shown in Figure 1.1. This process consists of a
series of trans-
formation steps, from data preprocessing to postprocessing of
data mining
results.
Input
Data
Feature Selection
Dimensionality Reduction
Normalization
Data Subsetting
Information
Filtering Patterns
Visualization
Pattern Interpretation
Figure 1.1. The process of knowledge discovery In databases
(KDO).
The input dat,a can be stored in a variety of formats (flat files,
spread-
sheets, or relational tables) and may reside in a centralized data
repository
or be dist,r ibu ted across multip le sites. The pu rpose of p r
eprocessing is
to transform the raw input data into an appropriate format for
subsequent
analysis. The steps involved in data preprocessing include
fusing data from
multip le sources, cleaning data to remove noise and duplicate
observations,
and selecting records and features t hat are relevant to t he data
mining task
at hand. Because of the many ways data can be collected and
stored, data
4 Chapter 1 Introduction
preprocessing is perhaps the most laborious and time-consuming
step in the
overall knowledge discovery process.
"Closing the loop" is the phrase often used to refer to the
process of in-
tegrating data mining results into decision support systems. For
example,
in business applications, the insights offered by data mining
results can be
integrated with campaign management tools so that effective
marketing pro-
motions can be conducted and tested. Such integration requires
a …

More Related Content

Similar to 1 Readings Week 10 Hô Chi Minh (1880-1969) De.docx

Ch- 1 The French Revolution 1.pptx
Ch- 1 The French Revolution 1.pptxCh- 1 The French Revolution 1.pptx
Ch- 1 The French Revolution 1.pptxBhoomikaSahu11
 
Revolutions In Latin America
Revolutions In  Latin  AmericaRevolutions In  Latin  America
Revolutions In Latin Americadannydubious
 
The Cold War and DecolonizationCuba and the United State.docx
The Cold War and DecolonizationCuba and the United State.docxThe Cold War and DecolonizationCuba and the United State.docx
The Cold War and DecolonizationCuba and the United State.docxcherry686017
 
Ways of the World
Ways of the WorldWays of the World
Ways of the Worlddstewart14
 
Slides on The French revolution
Slides on The French revolutionSlides on The French revolution
Slides on The French revolutionvijaybh3
 
Copy of History SBA .pdf
Copy of History SBA .pdfCopy of History SBA .pdf
Copy of History SBA .pdfChamiqueBrown1
 

Similar to 1 Readings Week 10 Hô Chi Minh (1880-1969) De.docx (8)

17.4.12
17.4.1217.4.12
17.4.12
 
Unit 1 history_
Unit 1 history_Unit 1 history_
Unit 1 history_
 
Ch- 1 The French Revolution 1.pptx
Ch- 1 The French Revolution 1.pptxCh- 1 The French Revolution 1.pptx
Ch- 1 The French Revolution 1.pptx
 
Revolutions In Latin America
Revolutions In  Latin  AmericaRevolutions In  Latin  America
Revolutions In Latin America
 
The Cold War and DecolonizationCuba and the United State.docx
The Cold War and DecolonizationCuba and the United State.docxThe Cold War and DecolonizationCuba and the United State.docx
The Cold War and DecolonizationCuba and the United State.docx
 
Ways of the World
Ways of the WorldWays of the World
Ways of the World
 
Slides on The French revolution
Slides on The French revolutionSlides on The French revolution
Slides on The French revolution
 
Copy of History SBA .pdf
Copy of History SBA .pdfCopy of History SBA .pdf
Copy of History SBA .pdf
 

More from aulasnilda

1. Analyze the case and determine the factors that have made KFC a s.docx
1. Analyze the case and determine the factors that have made KFC a s.docx1. Analyze the case and determine the factors that have made KFC a s.docx
1. Analyze the case and determine the factors that have made KFC a s.docxaulasnilda
 
1. A.Discuss how the concept of health has changed over time. B.Di.docx
1. A.Discuss how the concept of health has changed over time. B.Di.docx1. A.Discuss how the concept of health has changed over time. B.Di.docx
1. A.Discuss how the concept of health has changed over time. B.Di.docxaulasnilda
 
1. Abstract2. Introduction to Bitcoin and Ethereum3..docx
1. Abstract2. Introduction to Bitcoin and Ethereum3..docx1. Abstract2. Introduction to Bitcoin and Ethereum3..docx
1. Abstract2. Introduction to Bitcoin and Ethereum3..docxaulasnilda
 
1. A. Compare vulnerable populations. B. Describe an example of one .docx
1. A. Compare vulnerable populations. B. Describe an example of one .docx1. A. Compare vulnerable populations. B. Describe an example of one .docx
1. A. Compare vulnerable populations. B. Describe an example of one .docxaulasnilda
 
1. A highly capable brick and mortar electronics retailer with a l.docx
1. A highly capable brick and mortar electronics retailer with a l.docx1. A highly capable brick and mortar electronics retailer with a l.docx
1. A highly capable brick and mortar electronics retailer with a l.docxaulasnilda
 
1. A. Research the delivery, finance, management, and sustainabili.docx
1. A. Research the delivery, finance, management, and sustainabili.docx1. A. Research the delivery, finance, management, and sustainabili.docx
1. A. Research the delivery, finance, management, and sustainabili.docxaulasnilda
 
1. All of the following artists except for ONE used nudity as part.docx
1. All of the following artists except for ONE used nudity as part.docx1. All of the following artists except for ONE used nudity as part.docx
1. All of the following artists except for ONE used nudity as part.docxaulasnilda
 
1. According to the article, what is myth and how does it functi.docx
1. According to the article, what is myth and how does it functi.docx1. According to the article, what is myth and how does it functi.docx
1. According to the article, what is myth and how does it functi.docxaulasnilda
 
1. 6 Paragraph OverviewReflection on Reading Assigbnment Due Before.docx
1. 6 Paragraph OverviewReflection on Reading Assigbnment Due Before.docx1. 6 Paragraph OverviewReflection on Reading Assigbnment Due Before.docx
1. 6 Paragraph OverviewReflection on Reading Assigbnment Due Before.docxaulasnilda
 
1. A.Compare independent variables, B.dependent variables, and C.ext.docx
1. A.Compare independent variables, B.dependent variables, and C.ext.docx1. A.Compare independent variables, B.dependent variables, and C.ext.docx
1. A.Compare independent variables, B.dependent variables, and C.ext.docxaulasnilda
 
1. According to the Court, why is death a proportionate penalty for .docx
1. According to the Court, why is death a proportionate penalty for .docx1. According to the Court, why is death a proportionate penalty for .docx
1. According to the Court, why is death a proportionate penalty for .docxaulasnilda
 
1- Prisonization  What if  . . . you were sentenced to prison .docx
1- Prisonization  What if  . . . you were sentenced to prison .docx1- Prisonization  What if  . . . you were sentenced to prison .docx
1- Prisonization  What if  . . . you were sentenced to prison .docxaulasnilda
 
1. 250+ word count What is cultural and linguistic competence H.docx
1. 250+ word count What is cultural and linguistic competence H.docx1. 250+ word count What is cultural and linguistic competence H.docx
1. 250+ word count What is cultural and linguistic competence H.docxaulasnilda
 
1. 200 words How valuable is a having a LinkedIn profile Provid.docx
1. 200 words How valuable is a having a LinkedIn profile Provid.docx1. 200 words How valuable is a having a LinkedIn profile Provid.docx
1. 200 words How valuable is a having a LinkedIn profile Provid.docxaulasnilda
 
1. According to recent surveys, China, India, and the Philippines ar.docx
1. According to recent surveys, China, India, and the Philippines ar.docx1. According to recent surveys, China, India, and the Philippines ar.docx
1. According to recent surveys, China, India, and the Philippines ar.docxaulasnilda
 
1. Addressing inflation using Fiscal and Monetary Policy tools.S.docx
1. Addressing inflation using Fiscal and Monetary Policy tools.S.docx1. Addressing inflation using Fiscal and Monetary Policy tools.S.docx
1. Addressing inflation using Fiscal and Monetary Policy tools.S.docxaulasnilda
 
1. A vulnerability refers to a known weakness of an asset (resou.docx
1. A vulnerability refers to a known weakness of an asset (resou.docx1. A vulnerability refers to a known weakness of an asset (resou.docx
1. A vulnerability refers to a known weakness of an asset (resou.docxaulasnilda
 
1. According to the readings, philosophy began in ancient Egypt an.docx
1. According to the readings, philosophy began in ancient Egypt an.docx1. According to the readings, philosophy began in ancient Egypt an.docx
1. According to the readings, philosophy began in ancient Egypt an.docxaulasnilda
 
1-Explain what you understood from the paper with (one paragraph).docx
1-Explain what you understood from the paper with (one paragraph).docx1-Explain what you understood from the paper with (one paragraph).docx
1-Explain what you understood from the paper with (one paragraph).docxaulasnilda
 
1-Explanation of how healthcare policy can impact the advanced p.docx
1-Explanation of how healthcare policy can impact the advanced p.docx1-Explanation of how healthcare policy can impact the advanced p.docx
1-Explanation of how healthcare policy can impact the advanced p.docxaulasnilda
 

More from aulasnilda (20)

1. Analyze the case and determine the factors that have made KFC a s.docx
1. Analyze the case and determine the factors that have made KFC a s.docx1. Analyze the case and determine the factors that have made KFC a s.docx
1. Analyze the case and determine the factors that have made KFC a s.docx
 
1. A.Discuss how the concept of health has changed over time. B.Di.docx
1. A.Discuss how the concept of health has changed over time. B.Di.docx1. A.Discuss how the concept of health has changed over time. B.Di.docx
1. A.Discuss how the concept of health has changed over time. B.Di.docx
 
1. Abstract2. Introduction to Bitcoin and Ethereum3..docx
1. Abstract2. Introduction to Bitcoin and Ethereum3..docx1. Abstract2. Introduction to Bitcoin and Ethereum3..docx
1. Abstract2. Introduction to Bitcoin and Ethereum3..docx
 
1. A. Compare vulnerable populations. B. Describe an example of one .docx
1. A. Compare vulnerable populations. B. Describe an example of one .docx1. A. Compare vulnerable populations. B. Describe an example of one .docx
1. A. Compare vulnerable populations. B. Describe an example of one .docx
 
1. A highly capable brick and mortar electronics retailer with a l.docx
1. A highly capable brick and mortar electronics retailer with a l.docx1. A highly capable brick and mortar electronics retailer with a l.docx
1. A highly capable brick and mortar electronics retailer with a l.docx
 
1. A. Research the delivery, finance, management, and sustainabili.docx
1. A. Research the delivery, finance, management, and sustainabili.docx1. A. Research the delivery, finance, management, and sustainabili.docx
1. A. Research the delivery, finance, management, and sustainabili.docx
 
1. All of the following artists except for ONE used nudity as part.docx
1. All of the following artists except for ONE used nudity as part.docx1. All of the following artists except for ONE used nudity as part.docx
1. All of the following artists except for ONE used nudity as part.docx
 
1. According to the article, what is myth and how does it functi.docx
1. According to the article, what is myth and how does it functi.docx1. According to the article, what is myth and how does it functi.docx
1. According to the article, what is myth and how does it functi.docx
 
1. 6 Paragraph OverviewReflection on Reading Assigbnment Due Before.docx
1. 6 Paragraph OverviewReflection on Reading Assigbnment Due Before.docx1. 6 Paragraph OverviewReflection on Reading Assigbnment Due Before.docx
1. 6 Paragraph OverviewReflection on Reading Assigbnment Due Before.docx
 
1. A.Compare independent variables, B.dependent variables, and C.ext.docx
1. A.Compare independent variables, B.dependent variables, and C.ext.docx1. A.Compare independent variables, B.dependent variables, and C.ext.docx
1. A.Compare independent variables, B.dependent variables, and C.ext.docx
 
1. According to the Court, why is death a proportionate penalty for .docx
1. According to the Court, why is death a proportionate penalty for .docx1. According to the Court, why is death a proportionate penalty for .docx
1. According to the Court, why is death a proportionate penalty for .docx
 
1- Prisonization  What if  . . . you were sentenced to prison .docx
1- Prisonization  What if  . . . you were sentenced to prison .docx1- Prisonization  What if  . . . you were sentenced to prison .docx
1- Prisonization  What if  . . . you were sentenced to prison .docx
 
1. 250+ word count What is cultural and linguistic competence H.docx
1. 250+ word count What is cultural and linguistic competence H.docx1. 250+ word count What is cultural and linguistic competence H.docx
1. 250+ word count What is cultural and linguistic competence H.docx
 
1. 200 words How valuable is a having a LinkedIn profile Provid.docx
1. 200 words How valuable is a having a LinkedIn profile Provid.docx1. 200 words How valuable is a having a LinkedIn profile Provid.docx
1. 200 words How valuable is a having a LinkedIn profile Provid.docx
 
1. According to recent surveys, China, India, and the Philippines ar.docx
1. According to recent surveys, China, India, and the Philippines ar.docx1. According to recent surveys, China, India, and the Philippines ar.docx
1. According to recent surveys, China, India, and the Philippines ar.docx
 
1. Addressing inflation using Fiscal and Monetary Policy tools.S.docx
1. Addressing inflation using Fiscal and Monetary Policy tools.S.docx1. Addressing inflation using Fiscal and Monetary Policy tools.S.docx
1. Addressing inflation using Fiscal and Monetary Policy tools.S.docx
 
1. A vulnerability refers to a known weakness of an asset (resou.docx
1. A vulnerability refers to a known weakness of an asset (resou.docx1. A vulnerability refers to a known weakness of an asset (resou.docx
1. A vulnerability refers to a known weakness of an asset (resou.docx
 
1. According to the readings, philosophy began in ancient Egypt an.docx
1. According to the readings, philosophy began in ancient Egypt an.docx1. According to the readings, philosophy began in ancient Egypt an.docx
1. According to the readings, philosophy began in ancient Egypt an.docx
 
1-Explain what you understood from the paper with (one paragraph).docx
1-Explain what you understood from the paper with (one paragraph).docx1-Explain what you understood from the paper with (one paragraph).docx
1-Explain what you understood from the paper with (one paragraph).docx
 
1-Explanation of how healthcare policy can impact the advanced p.docx
1-Explanation of how healthcare policy can impact the advanced p.docx1-Explanation of how healthcare policy can impact the advanced p.docx
1-Explanation of how healthcare policy can impact the advanced p.docx
 

Recently uploaded

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 

Recently uploaded (20)

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 

1 Readings Week 10 Hô Chi Minh (1880-1969) De.docx

  • 1. 1 Readings Week 10 Hô Chi Minh (1880-1969) Declaration of Independence of the Democratic Republic of Vietnam, September 2, 1945 “All men are created equal. They are endowed by their Creator with certain inalienable rights, among them are Life, Liberty, and the pursuit of Happiness." This immortal statement was made in the Declaration of Independence of the United States of America in 1776. In a broader sense, this means: All the peoples on the earth are equal from birth, all the peoples have a right to live, to be happy and free. The Declaration of the French Revolution made in 1791 on the Rights of Man and the Citizen also states: “All men are born free and with equal rights, and must always remain free and have equal rights.”
  • 2. Those are undeniable truths. Nevertheless, for more than eighty years, the French imperialists, abusing the standard of Liberty, Equality, and Fraternity, have violated our Fatherland and oppressed our fellow-citizens. They have acted contrary to the ideals of humanity and justice. In the field of politics, they have deprived our people of every democratic liberty. They have enforced inhuman laws; they have set up three distinct political regimes in the North, the Center and the South of Vietnam in order to wreck our national unity and prevent our people from being united. They have built more prisons than schools. They have mercilessly slain our patriots; they have drowned our uprisings in rivers of blood. They have fettered public opinion; they have practiced obscurantism against our people. To weaken our race they have forced us to use opium and alcohol. In the field of economics, they have fleeced us to the backbone, impoverished our people, and devastated our land.
  • 3. They have robbed us of our rice fields, our mines, our forests, and our raw materials. They have monopolized the issuing of bank-notes and the export trade. They have invented numerous unjustifiable taxes and reduced our people, especially our peasantry, to a state of extreme poverty. 2 They have hampered the prospering of our national bourgeoisie; they have mercilessly exploited our workers. In the autumn of 1940, when the Japanese Fascists violated Indochina’s territory to establish new bases in their fight against the Allies, the French imperialists went down on their bended knees and handed over our country to them. Thus, from that date, our people were subjected to the double yoke of the French and the Japanese. Their sufferings and miseries increased. The result was that from the end of last year to the beginning of this year, from Quang Tri province to the
  • 4. North of Vietnam, more than two million of our fellow-citizens died from starvation. On March 9, the French troops were disarmed by the Japanese. The French colonialists either fled or surrendered showing that not only were they incapable of “protecting” us, but that, in the span of five years, they had twice sold our country to the Japanese. On several occasions before March 9, the Vietminh League urged the French to ally themselves with it against the Japanese. Instead of agreeing to this proposal, the French colonialists so intensified their terrorist activities against the Vietminh members that before fleeing they massacred a great number of our political prisoners detained at Yen Bay and Caobang. Notwithstanding all this, our fellow-citizens have always manifested toward the French a tolerant and humane attitude. Even after the Japanese putsch of March 1945, the Vietminh League helped many Frenchmen to cross the frontier, rescued some of them from Japanese jails, and protected French lives and property.
  • 5. From the autumn of 1940, our country had in fact ceased to be a French colony and had become a Japanese possession. After the Japanese had surrendered to the Allies, our whole people rose to regain our national sovereignty and to found the Democratic Republic of Vietnam. The truth is that we have wrested our independence from the Japanese and not from the French. The French have fled, the Japanese have capitulated, Emperor Bao Dai has abdicated. Our people have broken the chains which for nearly a century have fettered them and have won independence for the Fatherland. Our people at the same time have overthrown the monarchic regime that has reigned supreme for dozens of centuries. In its place has been established the present Democratic Republic. For these reasons, we, members of the Provisional Government, representing the whole Vietnamese people, declare that from now on we break off all relations of a colonial character with France; we repeal all the international obligation that France has so far subscribed to on
  • 6. behalf of Vietnam and we abolish all the special rights the French have unlawfully acquired in our Fatherland. 3 The whole Vietnamese people, animated by a common purpose, are determined to fight to the bitter end against any attempt by the French colonialists to reconquer their country. We are convinced that the Allied nations which at Tehran and San Francisco have acknowledged the principles of self-determination and equality of nations, will not refuse to acknowledge the independence of Vietnam. A people who have courageously opposed French domination for more than eight years, a people who have fought side by side with the Allies against the Fascists during these last years, such a people must be free and independent. For these reasons, we, members of the Provisional Government of the Democratic Republic of Vietnam, solemnly declare to the world that Vietnam has the
  • 7. right to be a free and independent country—and in fact is so already. The entire Vietnamese people are determined to mobilize all their physical and mental strength, to sacrifice their lives and property in order to safeguard their independence and liberty. George Marshall (1880-1959) Speech at Harvard, 5 June 1947 I'm profoundly grateful and touched by the great distinction and honor and great compliment accorded me by the authorities of Harvard this morning. I'm overwhelmed, as a matter of fact, and I'm rather fearful of my inability to maintain such a high rating as you've been generous enough to accord to me. In these historic and lovely surroundings, this perfect day, and this very wonderful assembly, it is a tremendously impressive thing to an individual in my position. But to speak more seriously, I need not tell you, gentlemen, that the world situation is very serious. That must be apparent to all intelligent people. I think one difficulty
  • 8. is that the problem is one of such enormous complexity that the very mass of facts presented to the public by press and radio make it exceedingly difficult for the man in the street to reach a clear appraisement of the situation. Furthermore, the people of this country are distant from the troubled areas of the earth and it is hard for them to comprehend the plight and consequent reactions of the long-suffering peoples, and the effect of those reactions on their governments in connection with our efforts to promote peace in the world. In considering the requirements for the rehabilitation of Europe, the physical loss of life, the visible destruction of cities, factories, mines and railroads was correctly estimated but it has become obvious during recent months that this visible destruction was probably less serious than the dislocation of the entire fabric of European economy. For the past 10 years conditions have been highly abnormal. The feverish preparation for war and the more feverish maintenance of the war effort engulfed all aspects of national economies.
  • 9. Machinery has fallen into disrepair or is entirely obsolete. Under the arbitrary and destructive Nazi rule, virtually every possible enterprise was geared into the German war machine. Long- standing commercial ties, private 4 institutions, banks, insurance companies, and shipping companies disappeared, through loss of capital, absorption through nationalization, or by simple destruction. In many countries, confidence in the local currency has been severely shaken. The breakdown of the business structure of Europe during the war was complete. Recovery has been seriously retarded by the fact that two years after the close of hostilities a peace settlement with Germany and Austria has not been agreed upon. But even given a more prompt solution of these difficult problems the rehabilitation of the economic structure of Europe quite evidently will require a much longer time and greater effort than had been foreseen.
  • 10. There is a phase of this matter which is both interesting and serious. The farmer has always produced the foodstuffs to exchange with the city dweller for the other necessities of life. This division of labor is the basis of modern civilization. At the present time it is threatened with breakdown. The town and city industries are not producing adequate goods to exchange with the food producing farmer. Raw materials and fuel are in short supply. Machinery is lacking or worn out. The farmer or the peasant cannot find the goods for sale which he desires to purchase. So the sale of his farm produce for money which he cannot use seems to him an unprofitable transaction. He, therefore, has withdrawn many fields from crop cultivation and is using them for grazing. He feeds more grain to stock and finds for himself and his family an ample supply of food, however short he may be on clothing and the other ordinary gadgets of civilization. Meanwhile people in the cities are short of food and fuel. So the governments are forced to use their foreign money and credits to procure these necessities abroad. This process exhausts funds
  • 11. which are urgently needed for reconstruction. Thus a very serious situation is rapidly developing which bodes no good for the world. The modern system of the division of labor upon which the exchange of products is based is in danger of breaking down. The truth of the matter is that Europe's requirements for the next three or four years of foreign food and other essential products - principally from America - are so much greater than her present ability to pay that she must have substantial additional help or face economic, social, and political deterioration of a very grave character. The remedy lies in breaking the vicious circle and restoring the confidence of the European people in the economic future of their own countries and of Europe as a whole. The manufacturer and the farmer throughout wide areas must be able and willing to exchange their products for currencies the continuing value of which is not open to question. Aside from the demoralizing effect on the world at large and the
  • 12. possibilities of disturbances arising as a result of the desperation of the people concerned, the consequences to the economy of the United States should be apparent to all. It is logical that the United States should do whatever it is able to do to assist in the return of normal economic health in the world, without which there can be no political stability and no assured peace. Our policy is directed not against any country or doctrine but against hunger, poverty, desperation and chaos. Its purpose should be the revival of a working economy in the world so as to permit the emergence of political and social conditions in which free institutions can exist. Such assistance, I am convinced, must not be on a piecemeal basis as various crises develop. Any assistance that this Government may 5 render in the future should provide a cure rather than a mere palliative. Any government that is willing to assist in the task of recovery will find full co- operation I am sure, on the part of the
  • 13. United States Government. Any government which maneuvers to block the recovery of other countries cannot expect help from us. Furthermore, governments, political parties, or groups which seek to perpetuate human misery in order to profit therefrom politically or otherwise will encounter the opposition of the United States. It is already evident that, before the United States Government can proceed much further in its efforts to alleviate the situation and help start the European world on its way to recovery, there must be some agreement among the countries of Europe as to the requirements of the situation and the part those countries themselves will take in order to give proper effect to whatever action might be undertaken by this Government. It would be neither fitting nor efficacious for this Government to undertake to draw up unilaterally a program designed to place Europe on its feet economically. This is the business of the Europeans. The initiative, I think, must come from Europe. The role of this country should consist of friendly aid in the drafting of a European
  • 14. program and of later support of such a program so far as it may be practical for us to do so. The program should be a joint one, agreed to by a number, if not all European nations. An essential part of any successful action on the part of the United States is an understanding on the part of the people of America of the character of the problem and the remedies to be applied. Political passion and prejudice should have no part. With foresight, and a willingness on the part of our people to face up to the vast responsibility which history has clearly placed upon our country, the difficulties I have outlined can and will be overcome. I am sorry that on each occasion I have said something publicly in regard to our international situation, I've been forced by the necessities of the case to enter into rather technical discussions. But to my mind, it is of vast importance that our people reach some general understanding of what the complications really are, rather than react from a passion or a prejudice or an emotion
  • 15. of the moment. As I said more formally a moment ago, we are remote from the scene of these troubles. It is virtually impossible at this distance merely by reading, or listening, or even seeing photographs or motion pictures, to grasp at all the real significance of the situation. And yet the whole world of the future hangs on a proper judgment. It hangs, I think, to a large extent on the realization of the American people, of just what are the various dominant factors. What are the reactions of the people? What are the justifications of those reactions? What are the sufferings? What is needed? What can best be done? What must be done? Thank you very much. 6
  • 16. Nikita Khrushchev (1894-1971) Secret Speech to the Closed Session of the Twentieth Party Congress, February 25, 1956 Excerpts We have to consider seriously and analyze correctly [the crimes of the Stalin era] in order that we may preclude any possibility of a repetition in any form whatever of what took place during the life of Stalin, who absolutely did not tolerate collegiality in leadership and in work, and who practiced brutal violence, not only toward everything which opposed him, but also toward that which seemed to his capricious and despotic character, contrary to his concepts. Stalin acted not through persuasion, explanation, and patient cooperation with people, but by imposing his concepts and demanding absolute submission to his opinion. Whoever opposed this concept or tried to prove hi viewpoint, and the correctness of his position, was doomed to removal from the leading collective and to subsequent moral and physical annihilation. This was
  • 17. especially true during the period following the XVIIth Party Congress (1934).... Stalin originated the concept enemy of the people. This term automatically rendered it unnecessary that the ideological errors of a man or men engaged in a controversy be proven; t his term made possible the usage of the most cruel repression, violating all norms of revolutionary legality, against anyone who in any way disagreed with Stalin, against those who were only suspected of hostile intent, against those who had bad reputations. This concept, enemy of the people, actually eliminated the possibility of any kind of ideological fight or the making of one's views known on this or that issue, even those of a practical character.... The only proof of guilt used, against all norms of current legal science, was the confession of the accused himself; and, as subsequent probing proved, confessions were acquired through physical pressures against the accused. This led to the glaring violations of revolutionary legality, and to the fact that many entirely innocent persons, who in the past had defended the Party line,
  • 18. became victims.... The Commission [of Inquiry] has become acquainted with a large quantity of materials in the NKVD archives…. It became apparent that many Party, Soviet and economic activists who were branded in 1937-1938 as enemies were actually never enemies, spies, wreckers, etc., but were always honest Communists; they were only so stigmatized, and often, no longer able to bear barbaric tortures, they charged themselves with all kinds of grave and unlikely crimes.... Lenin used severe methods only in the most necessary cases, when the exploiting classes were still in existence and were vigorously opposing the revolution, when the struggle for survival was decidedly assuming the sharpest forms, even including a civil war. Stalin, on the other hand, used extreme methods and mass repression at a time when the revolution was already victorious, when the Soviet state was strengthened, when the exploiting classes were already liquidated and Socialist relations were rooted solidly in all phases of national economy, when our Party was politically consolidated
  • 19. and had strengthened itself both numerically and ideologically. It is clear that here Stalin showed in a whole series of cases his intolerance, his brutality and his abuse of power. Instead of proving his political correctness and 7 mobilizing the masses, he often chose the path of repression and physical annihilation, not only against actual enemies, but also against individuals who had not committed any crimes against the Party and the Soviet government.... Sixteen Political, Economic, and Ideological Points, Budapest, October 22, 1956 RESOLUTION ADOPTED AT PLENARY MEETING OF THE BUILDING INDUSTRY TECHNOLOGY UNIVERSITY' Students of Budapest! The following resolution was born on 22 October 1956, at the dawn of a
  • 20. new period in Hungarian history, in the Hall of the Building Industry Technological University as a result of the spontaneous movement of several thousand of the Hungarian youth who love their Fatherland:(1) We demand the immediate withdrawal of all Soviet troops in accordance with the provisions of the Peace Treaty.(2) We demand the election of new leaders in the Hungarian Workers' Party on the low, medium and high levels by secret ballot from the ranks upwards. These leaders should convene the Party Congress within the shortest possible time and should elect a new central body of leaders.(3) The Government should be reconstituted under the leadership of Comrade Imre Nagy; all criminal leaders of the Stalinist-Rdkosi era should be relieved of their posts at once.(4) We demand a public trial in the criminal case of Milidly Farkas and his accomplices. Mdty-ds Rdkosi, who is primarily responsible for all the crimes of the recent past and for the ruin of this country, should be brought home and brought before a People's Court of judgment.(5) We demand general elections in this country, with universal
  • 21. suffrage, secret ballot and the participation of several Parties for the purpose of electing a new National Assembly. We demand that the workers should have the right to strike.(6) We demand a re-examination and re-adjustment of Hungarian-Soviet and Hungarian-Yugoslav political, economic and intellectual relations on the basis of complete political and economic equality and of non~intervention in each other's internal affairs.(7) We demand the re-organization of the entire economic life of Hungary, with the assistance of specialists. Our whole economic system based on planned economy should be re-examined with an eve to Hungarian conditions and to the vital interests of the Hungarian people.(8) Our foreign trade agreements and the real figures in respect of reparations that can never be paid should be made public. We demand frank and sincere information concerning the country's uranium deposits, their exploitation and the Russian concession. We demand that Hungary should have the right to sell the uranium ore freely at world market prices in exchange for hard currency.(9) We demand the complete revision of
  • 22. norms in industry and an urgent and radical adjustment of wages to meet the demands of workers and intellectuals. We demand that minimum living wages for workers should be fixed.(10) We demand that the delivery system should be placed on a new basis and that produce should be used rationally. We demand equal treat ment of peasants farming individually.(11) We demand the re-examination of all political and economic trials by independent courts and the release and rehabilitation of innocent persons. We demand the immediate repatriation of prisoners-of-war and of civilians deported to the Soviet Union, including prisoners who have been condemned beyond the frontiers of Hungary.(12) We demand complete freedom of opinion and expression, freedom of the Press and a free Radio, as well as a new daily newspaper of large circulation for 8 the MEFESZ [League of Hungarian University and College Student Associations] organization.
  • 23. We demand that the existing 'screening material' should be made public and destroyed.(13) We demand that the Stalin statue-the symbol of Stalinist tyranny and political oppression-should be removed as quickly as possible and that a memorial worthy of the freedom fighters and martyrs of 1848-49 should be erected on its site.(14) In place of the existing coat of arms, which is foreign to the Hungarian people, we wish the re-introduction of the old Hungarian Kossuth arms. We demand for the Hungarian Army new uniforms worthy of our national traditions. We demand that 15 March should be a national holiday and a non~working day and that 6 October should be a day of national mourning and a school holiday..(15) The youth of the Technological University of Budapest unanimously express their complete solidarity with the Polish and Warsaw workers and youth in connection with the Polish national independence movement.(16) The students of the Building Industry Technological University will organize local units of MEFESZ as quickly as possible, and have resolved to convene a Youth Parliament in Budapest
  • 24. for the 27th of this month (Saturday) at which the entire youth of this country will be represented by their delegates. The students of the Technological University and of the various other Universities will gather in the Gorkij Fasor before the Writers' Union Headquarters tomorrow, the 23rd. of this month, at 2.30 P.m., whence they will proceed to the Pálffy Tér (Bern Ter) to the Bern statue, on which they will lay wreaths in sign of their sympathy with the Polish freedom movement. The workers of the factories are invited to join in this procession. Treaty of ROME, 1957 Treaty establishing the European Economic Community HIS MAJESTY THE KING OF THE BELGIANS, THE PRESIDENT OF THE FEDERAL REPUBLIC OF GERMANY, THE PRESIDENT OF THE FRENCH REPUBLIC, THE PRESIDENT OF THE ITALIAN REPUBLIC, HER ROYAL HIGHNESS THE GRAND DUCHESS OF LUXEMBOURG, HER MAJESTY THE QUEEN OF THE NETHERLANDS,
  • 25. DETERMINED to establish the foundations of an ever closer union among the European peoples, DECIDED to ensure the economic and social progress of their countries by common action in eliminating the barriers which divide Europe, DIRECTING their efforts to the essential purpose of constantly improving the living and working conditions of their peoples, RECOGNISING that the removal of existing obstacles calls for concerted action in order to guarantee a steady expansion, a balanced trade and fair competition, ANXIOUS to strengthen the unity of their economies and to ensure their harmonious development by reducing the differences existing between the various regions and by mitigating the backwardness of the less favoured, DESIROUS of contributing by means of a common commercial policy to the progressive abolition of restrictions on international trade,
  • 26. 9 INTENDING to confirm the solidarity which binds Europe and overseas countries, and desiring to ensure the development of their prosperity, in accordance with the principles of the Charter of the United Nations, RESOLVED to strengthen the safeguards of peace and liberty by establishing this combination of resources, and calling upon the other peoples of Europe who share their ideal to join in their efforts, HAVE DECIDED to create a European Economic Community and to this end have designated as their plenipotentiaries: HIS MAJESTY THE KING OF THE BELGIANS: Mr. Paul-Henri SPAAK, Minister of Foreign Affairs, Baron J. Ch. SNOY and D’OPPUERS, Secretary-General of the Ministry of Economic Affairs, Head of the Belgian delegation to the Intergovernmental Conference; THE PRESIDENT OF THE FEDERAL REPUBLIC OF
  • 27. GERMANY: Dr. Konrad ADENAUER, Federal Chancellor, Professor Dr. Walter HALLSTEIN, State Secretary of the Federal Foreign Office; THE PRESIDENT OF THE FRENCH REPUBLIC: Mr. Christian PINEAU, Minister of Foreign Affairs, Mr. Maurice FAURE, Under-Secretary of State for Foreign Affairs; THE PRESIDENT OF THE ITALIAN REPUBLIC: Mr. Antonio SEGNI, President of the Council of Ministers, Professor Gaetano MARTINO, Ministers of Foreign Affairs; HER ROYAL HIGHNESS THE GRAND DUCHESS OF LUXEMBOURG: Mr. Joseph BECH, Prime Minister, Minister of Foreign Affairs, Mr. Lambert SCHAUS, Ambassador, Head of the Luxembourg delegation to the Intergovernmental Conference; HER MAJESTY THE QUEEN OF THE NETHERLANDS: Mr. Joseph LUNS, Minister of Foreign Affairs, Mr. J. LINTHORST HOMAN, Head of the Netherlands delegation to the Intergovernmental Conference;
  • 28. WHO, having exchanged their full powers, found in good and due form, have agreed, as follows: PART ONE — Principles[edit] Article 1 By the present Treaty, the HIGH CONTRACTING PARTIES establish among themselves a EUROPEAN ECONOMIC COMMUNITY. Article 2 It shall be the aim of the Community, by establishing a Common Market and progressively approximating the economic policies of Member States, to promote throughout the Community a harmonious development of economic activities, a continuous and balanced expansion, an https://en.wikisource.org/w/index.php?title=Treaty_establishing _the_European_Economic_Community&action=edit&section=1 10 increased stability, an accelerated raising of the standard of living and closer relations between its Member States.
  • 29. Article 3 For the purposes set out in the preceding Article, the activities of the Community shall include, under the conditions and with the timing provided for in this Treaty: (a) the elimination, as between Member States, of customs duties and of quantitative restrictions in regard to the importation and exportation of goods, as well as of all other measures with equivalent effect; (b) the establishment of a common customs tariff and a common commercial policy towards third countries; (c) the abolition, as between Member States, of the obstacles to the free movement of persons, services and capital; (d) the inauguration of a common agricultural policy; (e) the inauguration of a common transport policy; (f) the establishment of a system ensuring that competition shall not be distorted in the Common Market; (g) the application of procedures which shall make it possible to
  • 30. co-ordinate the economic policies of Member States and to remedy disequilibria in their balances of payments; (h) the approximation of their respective municipal law to the extent necessary for the functioning of the Common Market; (i) the creation of a European Social Fund in order to improve the possibilities of employment for workers and to contribute to the raising of their standard of living; (j) the establishment of a European Investment Bank intended to facilitate the economic expansion of the Community through the creation of new resources; and (k) the association of overseas countries and territories with the Community with a view to increasing trade and to pursuing jointly their effort towards economic and social development. Article 4 1. The achievement of the tasks entrusted to the Community shall be ensured by • an ASSEMBLY, • a COUNCIL,
  • 31. • a COMMISSION, and • a COURT OF JUSTICE. Each of these institutions shall act within the limits of the powers conferred upon it by this Treaty… Article 6 11 1. Member States, acting in close collaboration with the institutions of the Community, shall co - ordinate their respective economic policies to the extent that is necessary to attain the objectives of this Treaty…. Article 7 Within the … ' ' I '
  • 32. __ ,_· .... ~::-- PANG-NING TAN Michigan State University MICHAEL STEINBACH Un iversity of Minnesota VIPIN KUMAR Univers i ty of Minnesota and Army High Performance Comput ing Research Center ~ TT • . Boston S;m Fr.mcisco New York London Toronto Sydney Tokyo Singapore Madrid Mexico Cicy Munich Paris Cape Town Hong Kong Montreal Contents Preface vii 1 Introduction 1 1.1 What Is Data Mining? . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Motivating Challenges . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 The Origins of Data Mining . . . . . . . . . . . . . . . . . . . . 6 1.4 Data Mining Tasks . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Scope and Organization of the Book . . . . . . . . . . . . . . . 11 1.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 13
  • 33. 1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2 Data 19 2.1 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.1 Attributes and Measurement . . . . . . . . . . . . . . . 23 2.1.2 Types of Data Sets . . . . . . . . . . . . . . . . . . . . . 29 2.2 Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.1 Measurement and Data Collection Issues . . . . . . . . . 37 2.2.2 Issues Related to Applications . . . . . . . . . . . . . . 43 2.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.3.1 Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.3.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.3.3 Dimensionality Reduction . . . . . . . . . . . . . . . . . 50 2.3.4 Feature Subset Selection . . . . . . . . . . . . . . . . . . 52 2.3.5 Feature Creation . . . . . . . . . . . . . . . . . . . . . . 55 2.3.6 Discretization and Binarization . . . . . . . . . . . . . . 57 2.3.7 Variable Transformation . . . . . . . . . . . . . . . . . . 63 2.4 Measures of Similarity and Dissimilarity . . . . . . . . . . . . . 65 2.4.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.4.2 Similarity and Dissimilarity between Simple Attributes . 67 2.4.3 Dissimilarities between Data Objects . . . . . . . . . . . 69 2.4.4 Similarities between Data Objects . . . . . . . . . . . . 72 xiv Contents 2.4.5 Examples of Proximity Measures . . . . . . . . . . . . . 73 2.4.6 Issues in Proximity Calculation . . . . . . . . . . . . . . 80 2.4.7 Selecting the Right Proximity Measure . . . . . . . . . . 83 2.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 84
  • 34. 2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3 Exploring Data 97 3.1 The Iris Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.2 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.2.1 Frequencies and the Mode . . . . . . . . . . . . . . . . . 99 3.2.2 Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . 100 3.2.3 Measures of Location: Mean and Median . . . . . . . . 101 3.2.4 Measures of Spread: Range and Variance . . . . . . . . 102 3.2.5 Multivariate Summary Statistics . . . . . . . . . . . . . 104 3.2.6 Other Ways to Summarize the Data . . . . . . . . . . . 105 3.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.3.1 Motivations for Visualization . . . . . . . . . . . . . . . 105 3.3.2 General Concepts . . . . . . . . . . . . . . . . . . . . . . 106 3.3.3 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.3.4 Visualizing Higher-Dimensional Data . . . . . . . . . . . 124 3.3.5 Do’s and Don’ts . . . . . . . . . . . . . . . . . . . . . . 130 3.4 OLAP and Multidimensional Data Analysis . . . . . . . . . . . 131 3.4.1 Representing Iris Data as a Multidimensional Array . . 131 3.4.2 Multidimensional Data: The General Case . . . . . . . . 133 3.4.3 Analyzing Multidimensional Data . . . . . . . . . . . . 135 3.4.4 Final Comments on Multidimensional Data Analysis . . 139 3.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 139 3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 4 Classification: Basic Concepts, Decision Trees, and Model Evaluation 145 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 4.2 General Approach to Solving a Classification Problem . . . . . 148
  • 35. 4.3 Decision Tree Induction . . . . . . . . . . . . . . . . . . . . . . 150 4.3.1 How a Decision Tree Works . . . . . . . . . . . . . . . . 150 4.3.2 How to Build a Decision Tree . . . . . . . . . . . . . . . 151 4.3.3 Methods for Expressing Attribute Test Conditions . . . 155 4.3.4 Measures for Selecting the Best Split . . . . . . . . . . . 158 4.3.5 Algorithm for Decision Tree Induction . . . . . . . . . . 164 4.3.6 An Example: Web Robot Detection . . . . . . . . . . . 166 Contents xv 4.3.7 Characteristics of Decision Tree Induction . . . . . . . . 168 4.4 Model Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . 172 4.4.1 Overfitting Due to Presence of Noise . . . . . . . . . . . 175 4.4.2 Overfitting Due to Lack of Representative Samples . . . 177 4.4.3 Overfitting and the Multiple Comparison Procedure . . 178 4.4.4 Estimation of Generalization Errors . . . . . . . . . . . 179 4.4.5 Handling Overfitting in Decision Tree Induction . . . . 184 4.5 Evaluating the Performance of a Classifier . . . . . . . . . . . . 186 4.5.1 Holdout Method . . . . . . . . . . . . . . . . . . . . . . 186 4.5.2 Random Subsampling . . . . . . . . . . . . . . . . . . . 187 4.5.3 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . 187 4.5.4 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . 188 4.6 Methods for Comparing Classifiers . . . . . . . . . . . . . . . . 188 4.6.1 Estimating a Confidence Interval for Accuracy . . . . . 189 4.6.2 Comparing the Performance of Two Models . . . . . . . 191 4.6.3 Comparing the Performance of Two Classifiers . . . . . 192
  • 36. 4.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 193 4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 5 Classification: Alternative Techniques 207 5.1 Rule-Based Classifier . . . . . . . . . . . . . . . . . . . . . . . . 207 5.1.1 How a Rule-Based Classifier Works . . . . . . . . . . . . 209 5.1.2 Rule-Ordering Schemes . . . . . . . . . . . . . . . . . . 211 5.1.3 How to Build a Rule-Based Classifier . . . . . . . . . . . 212 5.1.4 Direct Methods for Rule Extraction . . . . . . . . . . . 213 5.1.5 Indirect Methods for Rule Extraction . . . . . . . . . . 221 5.1.6 Characteristics of Rule-Based Classifiers . . . . . . . . . 223 5.2 Nearest-Neighbor classifiers . . . . . . . . . . . . . . . . . . . . 223 5.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 225 5.2.2 Characteristics of Nearest-Neighbor Classifiers . . . . . 226 5.3 Bayesian Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . 227 5.3.1 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . 228 5.3.2 Using the Bayes Theorem for Classification . . . . . . . 229 5.3.3 Näıve Bayes Classifier . . . . . . . . . . . . . . . . . . . 231 5.3.4 Bayes Error Rate . . . . . . . . . . . . . . . . . . . . . . 238 5.3.5 Bayesian Belief Networks . . . . . . . . . . . . . . . . . 240 5.4 Artificial Neural Network (ANN) . . . . . . . . . . . . . . . . . 246 5.4.1 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . 247 5.4.2 Multilayer Artificial Neural Network . . . . . . . . . . . 251 5.4.3 Characteristics of ANN . . . . . . . . . . . . . . . . . . 255 xvi Contents 5.5 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . 256 5.5.1 Maximum Margin Hyperplanes . . . . . . . . . . . . . . 256 5.5.2 Linear SVM: Separable Case . . . . . . . . . . . . . . . 259 5.5.3 Linear SVM: Nonseparable Case . . . . . . . . . . . . . 266 5.5.4 Nonlinear SVM . . . . . . . . . . . . . . . . . . . . . . . 270
  • 37. 5.5.5 Characteristics of SVM . . . . . . . . . . . . . . . . . . 276 5.6 Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . . 276 5.6.1 Rationale for Ensemble Method . . . . . . . . . . . . . . 277 5.6.2 Methods for Constructing an Ensemble Classifier . . . . 278 5.6.3 Bias-Variance Decomposition . . . . . . . . . . . . . . . 281 5.6.4 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 5.6.5 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 5.6.6 Random Forests . . . . . . . . . . . . . . . . . . . . . . 290 5.6.7 Empirical Comparison among Ensemble Methods . . . . 294 5.7 Class Imbalance Problem . . . . . . . . . . . . . . . . . . . . . 294 5.7.1 Alternative Metrics . . . . . . . . . . . . . . . . . . . . . 295 5.7.2 The Receiver Operating Characteristic Curve . . . . . . 298 5.7.3 Cost-Sensitive Learning . . . . . . . . . . . . . . . . . . 302 5.7.4 Sampling-Based Approaches . . . . . . . . . . . . . . . . 305 5.8 Multiclass Problem . . . . . . . . . . . . . . . . . . . . . . . . . 306 5.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 309 5.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 6 Association Analysis: Basic Concepts and Algorithms 327 6.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . 328 6.2 Frequent Itemset Generation . . . . . . . . . . . . . . . . . . . 332 6.2.1 The Apriori Principle . . . . . . . . . . . . . . . . . . . 333 6.2.2 Frequent Itemset Generation in the Apriori Algorithm . 335 6.2.3 Candidate Generation and Pruning . . . . . . . . . . . . 338 6.2.4 Support Counting . . . . . . . . . . . . . . . . . . . . . 342 6.2.5 Computational Complexity . . . . . . . . . . . . . . . . 345 6.3 Rule Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 349 6.3.1 Confidence-Based Pruning . . . . . . . . . . . . . . . . . 350 6.3.2 Rule Generation in Apriori Algorithm . . . . . . . . . . 350
  • 38. 6.3.3 An Example: Congressional Voting Records . . . . . . . 352 6.4 Compact Representation of Frequent Itemsets . . . . . . . . . . 353 6.4.1 Maximal Frequent Itemsets . . . . . . . . . . . . . . . . 354 6.4.2 Closed Frequent Itemsets . . . . . . . . . . . . . . . . . 355 6.5 Alternative Methods for Generating Frequent Itemsets . . . . . 359 6.6 FP-Growth Algorithm . . . . . . . . . . . . . . . . . . . . . . . 363 Contents xvii 6.6.1 FP-Tree Representation . . . . . . . . . . . . . . . . . . 363 6.6.2 Frequent Itemset Generation in FP-Growth Algorithm . 366 6.7 Evaluation of Association Patterns . . . . . . . . . . . . . . . . 370 6.7.1 Objective Measures of Interestingness . . . . . . . . . . 371 6.7.2 Measures beyond Pairs of Binary Variables . . . . . . . 382 6.7.3 Simpson’s Paradox . . . . . . . . . . . . . . . . . . . . . 384 6.8 Effect of Skewed Support Distribution . . . . . . . . . . . . . . 386 6.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 390 6.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 7 Association Analysis: Advanced Concepts 415 7.1 Handling Categorical Attributes . . . . . . . . . . . . . . . . . 415 7.2 Handling Continuous Attributes . . . . . . . . . . . . . . . . . 418 7.2.1 Discretization-Based Methods . . . . . . . . . . . . . . . 418 7.2.2 Statistics-Based Methods . . . . . . . . . . . . . . . . . 422 7.2.3 Non-discretization Methods . . . . . . . . . . . . . . . . 424
  • 39. 7.3 Handling a Concept Hierarchy . . . . . . . . . . . . . . . . . . 426 7.4 Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 429 7.4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . 429 7.4.2 Sequential Pattern Discovery . . . . . . . . . . . . . . . 431 7.4.3 Timing Constraints . . . . . . . . . . . . . . . . . . . . . 436 7.4.4 Alternative Counting Schemes . . . . . . . . . . . . . . 439 7.5 Subgraph Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 442 7.5.1 Graphs and Subgraphs . . . . . . . . . . . . . . . . . . . 443 7.5.2 Frequent Subgraph Mining . . . . . . . . . . . . . . . . 444 7.5.3 Apriori -like Method . . . . . . . . . . . . . . . . . . . . 447 7.5.4 Candidate Generation . . . . . . . . . . . . . . . . . . . 448 7.5.5 Candidate Pruning . . . . . . . . . . . . . . . . . . . . . 453 7.5.6 Support Counting . . . . . . . . . . . . . . . . . . . . . 457 7.6 Infrequent Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 457 7.6.1 Negative Patterns . . . . . . . . . . . . . . . . . . . . . 458 7.6.2 Negatively Correlated Patterns . . . . . . . . . . . . . . 458 7.6.3 Comparisons among Infrequent Patterns, Negative Pat- terns, and Negatively Correlated Patterns . . . . . . . . 460 7.6.4 Techniques for Mining Interesting Infrequent Patterns . 461 7.6.5 Techniques Based on Mining Negative Patterns . . . . . 463 7.6.6 Techniques Based on Support Expectation . . . . . . . . 465 7.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 469 7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 xviii Contents 8 Cluster Analysis: Basic Concepts and Algorithms 487 8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
  • 40. 8.1.1 What Is Cluster Analysis? . . . . . . . . . . . . . . . . . 490 8.1.2 Different Types of Clusterings . . . . . . . . . . . . . . . 491 8.1.3 Different Types of Clusters . . . . . . . . . . . . . . . . 493 8.2 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 8.2.1 The Basic K-means Algorithm . . . . . . . . . . . . . . 497 8.2.2 K-means: Additional Issues . . . . . . . . . . . . . . . . 506 8.2.3 Bisecting K-means . . . . . . . . . . . . . . . . . . . . . 508 8.2.4 K-means and Different Types of Clusters . . . . . . . . 510 8.2.5 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 510 8.2.6 K-means as an Optimization Problem . . . . . . . . . . 513 8.3 Agglomerative Hierarchical Clustering . . . . . . . . . . . . . . 515 8.3.1 Basic Agglomerative Hierarchical Clustering Algorithm 516 8.3.2 Specific Techniques . . . . . . . . . . . . . . . . . . . . . 518 8.3.3 The Lance-Williams Formula for Cluster Proximity . . . 524 8.3.4 Key Issues in Hierarchical Clustering . . . . . . . . . . . 524 8.3.5 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 526 8.4 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 8.4.1 Traditional Density: Center-Based Approach . . . . . . 527 8.4.2 The DBSCAN Algorithm . . . . . . . . . . . . . . . . . 528 8.4.3 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 530 8.5 Cluster Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 532 8.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 533 8.5.2 Unsupervised Cluster Evaluation Using Cohesion and Separation . . . . . . . . . . . . . . . . . . . . . . . . . 536 8.5.3 Unsupervised Cluster Evaluation Using the Proximity Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542 8.5.4 Unsupervised Evaluation of Hierarchical Clustering . . .
  • 41. 544 8.5.5 Determining the Correct Number of Clusters . . . . . . 546 8.5.6 Clustering Tendency . . . . . . . . . . . . . . . . . . . . 547 8.5.7 Supervised Measures of Cluster Validity . . . . . . . . . 548 8.5.8 Assessing the Significance of Cluster Validity Measures . 553 8.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 555 8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 9 Cluster Analysis: Additional Issues and Algorithms 569 9.1 Characteristics of Data, Clusters, and Clustering Algorithms . 570 9.1.1 Example: Comparing K-means and DBSCAN . . . . . . 570 9.1.2 Data Characteristics . . . . . . . . . . . . . . . . . . . . 571 Contents xix 9.1.3 Cluster Characteristics . . . . . . . . . . . . . . . . . . . 573 9.1.4 General Characteristics of Clustering Algorithms . . . . 575 9.2 Prototype-Based Clustering . . . . . . . . . . . . . . . . . . . . 577 9.2.1 Fuzzy Clustering . . . . . . . . . . . . . . . . . . . . . . 577 9.2.2 Clustering Using Mixture Models . . . . . . . . . . . . . 583 9.2.3 Self-Organizing Maps (SOM) . . . . . . . . . . . . . . . 594 9.3 Density-Based Clustering . . . . . . . . . . . . . . . . . . . . . 600 9.3.1 Grid-Based Clustering . . . . . . . . . . . . . . . . . . . 601 9.3.2 Subspace Clustering . . . . . . . . . . . . . . . . . . . . 604 9.3.3 DENCLUE: A Kernel-Based Scheme for Density-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 608 9.4 Graph-Based Clustering . . . . . . . . . . . . . . . . . . . . . . 612
  • 42. 9.4.1 Sparsification . . . . . . . . . . . . . . . . . . . . . . . . 613 9.4.2 Minimum Spanning Tree (MST) Clustering . . . . . . . 614 9.4.3 OPOSSUM: Optimal Partitioning of Sparse Similarities Using METIS . . . . . . . . . . . . . . . . . . . . . . . . 616 9.4.4 Chameleon: Hierarchical Clustering with Dynamic Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 616 9.4.5 Shared Nearest Neighbor Similarity . . . . . . . . . . . 622 9.4.6 The Jarvis-Patrick Clustering Algorithm . . . . . . . . . 625 9.4.7 SNN Density . . . . . . . . . . . . . . . . . . . . . . . . 627 9.4.8 SNN Density-Based Clustering . . . . . . . . . . . . . . 629 9.5 Scalable Clustering Algorithms . . . . . . . . . . . . . . . . . . 630 9.5.1 Scalability: General Issues and Approaches . . . . . . . 630 9.5.2 BIRCH . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 9.5.3 CURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635 9.6 Which Clustering Algorithm? . . . . . . . . . . . . . . . . . . . 639 9.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 643 9.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 10 Anomaly Detection 651 10.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 10.1.1 Causes of Anomalies . . . . . . . . . . . . . . . . . . . . 653 10.1.2 Approaches to Anomaly Detection . . . . . . . . . . . . 654 10.1.3 The Use of Class Labels . . . . . . . . . . . . . . . . . . 655 10.1.4 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656 10.2 Statistical Approaches . . . . . . . . . . . . . . . . . . . . . . . 658 10.2.1 Detecting Outliers in a Univariate Normal Distribution 659 10.2.2 Outliers in a Multivariate Normal Distribution . . . . . 661 10.2.3 A Mixture Model Approach for Anomaly Detection . . .
  • 43. 662 xx Contents 10.2.4 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 665 10.3 Proximity-Based Outlier Detection . . . . . . . . . . . . . . . . 666 10.3.1 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 666 10.4 Density-Based Outlier Detection . . . . . . . . . . . . . . . . . 668 10.4.1 Detection of Outliers Using Relative Density . . . . . . 669 10.4.2 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 670 10.5 Clustering-Based Techniques . . . . . . . . . . . . . . . . . . . 671 10.5.1 Assessing the Extent to Which an Object Belongs to a Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . 672 10.5.2 Impact of Outliers on the Initial Clustering . . . . . . . 674 10.5.3 The Number of Clusters to Use . . . . . . . . . . . . . . 674 10.5.4 Strengths and Weaknesses . . . . . . . . . . . . . . . . . 674 10.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 675 10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680 Appendix A Linear Algebra 685 A.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 A.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 685 A.1.2 Vector Addition and Multiplication by a Scalar . . . . . 685 A.1.3 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . 687 A.1.4 The Dot Product, Orthogonality, and Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . 688 A.1.5 Vectors and Data Analysis . . . . . . . . . . . . . . . . 690
  • 44. A.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691 A.2.1 Matrices: Definitions . . . . . . . . . . . . . . . . . . . . 691 A.2.2 Matrices: Addition and Multiplication by a Scalar . . . 692 A.2.3 Matrices: Multiplication . . . . . . . . . . . . . . . . . . 693 A.2.4 Linear Transformations and Inverse Matrices . . . . . . 695 A.2.5 Eigenvalue and Singular Value Decomposition . . . . . . 697 A.2.6 Matrices and Data Analysis . . . . . . . . . . . . . . . . 699 A.3 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 700 Appendix B Dimensionality Reduction 701 B.1 PCA and SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . 701 B.1.1 Principal Components Analysis (PCA) . . . . . . . . . . 701 B.1.2 SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706 B.2 Other Dimensionality Reduction Techniques . . . . . . . . . . . 708 B.2.1 Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . 708 B.2.2 Locally Linear Embedding (LLE) . . . . . . . . . . . . . 710 B.2.3 Multidimensional Scaling, FastMap, and ISOMAP . . . 712 Contents xxi B.2.4 Common Issues . . . . . . . . . . . . . . . . . . . . . . . 715 B.3 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 716 Appendix C Probability and Statistics 719 C.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 C.1.1 Expected Values . . . . . . . . . . . . . . . . . . . . . . 722 C.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
  • 45. C.2.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . 724 C.2.2 Central Limit Theorem . . . . . . . . . . . . . . . . . . 724 C.2.3 Interval Estimation . . . . . . . . . . . . . . . . . . . . . 725 C.3 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . 726 Appendix D Regression 729 D.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 D.2 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . 730 D.2.1 Least Square Method . . . . . . . . . . . . . . . . . . . 731 D.2.2 Analyzing Regression Errors . . . . . . . . . . . . . . . 733 D.2.3 Analyzing Goodness of Fit . . . . . . . . . . . . . . . . 735 D.3 Multivariate Linear Regression . . . . . . . . . . . . . . . . . . 736 D.4 Alternative Least-Square Regression Methods . . . . . . . . . . 737 Appendix E Optimization 739 E.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . 739 E.1.1 Numerical Methods . . . . . . . . . . . . . . . . . . . . 742 E.2 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . 746 E.2.1 Equality Constraints . . . . . . . . . . . . . . . . . . . . 746 E.2.2 Inequality Constraints . . . . . . . . . . . . . . . . . . . 747 Author Index 750 Subject Index 758 Copyright Permissions 769
  • 46. 1 I htrod uctiori Rapid advances in data collection and storage technology have enabled or- ganizations to accumulate vast amounts of data. However, extracting useful information has proven extremely challenging. Often, traditional data analy- sis tools and techniques cannot be used because of the massive size of a data set. Sometimes, the non-traditional nature of the data means that traditional approaches cannot be applied even if the data set is relatively small. In other situations, the questions that need to be answered cannot be addressed using existing data analysis techniques, and thus, new methods need to be devel- oped. Data mining is a technology that blends traditional data analysis methods with sophisticated algorithms for processing large volumes of data. It has also opened up exciting opportunities for exploring and analyzing new types of data and for analyzing old types of data in new ways. In this introductory chapter, we present an overview of data mining and outline the key topics to be covered in this book. We start with a descript ion of some well-known applications that require new techniques for data analysis.
  • 47. Business Point-of-sale data collection (bar code scanners, radio frequency identification (RFID), and smart card technology) have allowed retailers to collect up-to-the-minute data about customer purchases at the checkout coun- ters of their stores. Retailers can utilize this information, along with other business-critical data such as Web logs from e-commerce Web sites and cus- tomer service records from call centers, to help them better understand the needs of their customers and make more informed business decisions. Data mining techniques can be used to support a wide range of business intelligence applications such as customer profiling, targeted marketing, work- flow management, store layout , and fraud detection. It can also help retailers ..... -~-::....o:.·_-:---" 2 Chapter 1 Introduction answer important business questions such as "Who are the most profitable customers?" "What products can be cross-sold or up-sold?" and "What is the revenue outlook of the company for next year?" Some of these questions mo- tivated the creation of association analysis (Chapters 6 and 7) , a new data
  • 48. analysis technique. Medicine, Science, and Engineering Researchers in medicine, science, and engineering are rapidly accumulating data that is key to important new discoveries. For example, as an important step toward improving our under- standing of the Earth's climate system, NASA has deployed a series of Earth- orbiting satellites that continuously generate global observations of the land surface, oceans, and atmosphere. However, because of the size and spatia- temporal nature of the data, tradit ional methods are often not suitable for analyzing these data sets. Techniques developed in data mining can aid Earth scientists in answering questions such as "What is the relationship between the frequency and intensity of ecosystem disturbances such as drougllts and hurricanes to global warming?" "How is land surface precipitation and temper- ature affected by ocean surface temperature?" and "How well can we predict the beginning and end of the growing season for a region?" As another example, researchers in molecular biology hope to use the large amounts of genomic data currently being gathered to better understand the structure and function of genes. In the past, traditional methods in molecu- lar biology allowed scientists to study only a few genes at a time in a given
  • 49. experiment. Recent breakthroughs in microarray technology have enabled sci- entists to compare the behavior of thousands of genes under various situations. Such comparisons can help determine the function of each gene and perhaps isolate the genes responsible for certain diseases. However, the noisy and high- dimensional nature of data requires new types of data analysis. In addition to analyzing gene array data, data mining can also be used to address other important biological challenges such as protein structure prediction, multiple sequence alignment, the modeling of biochemical pathways, and phylogenetics. 1.1 What Is Data Mining? Data mining is the process of automatically discovering useful information in large data repositories. Data mining techniques are deployed to scour large databases in order to find novel and useful patterns that might otherwise remai n unknown. They also provide capabili ties to predict t.he outcome of a 1.1 What Is Data Min ing? 3 future observation, such as predicting whether a newly arrived customer will spend more t han $100 at a department store. Not all information d iscovery tasks are considered to be data mining . For
  • 50. example, looking up ind ividual records using a database management system or finding particular Web pages via a query to an Internet search engine are tasks related to the area of information r etr ieval. Although such tasks are important and may involve the use of the sophisticated algorithms and data structures, t hey rely on traditional computer science techniques and obvious feat ures of the data to create index structures for efficiently organizing and retrieving information. Nonetheless, data mining techniques have been used to enhance information retrieval systems. Data M ining and Knowledge Discovery Data mining is an integral part of knowledge d iscovery in databases (KDD), which is t he overall process of convert ing raw data into useful in- formation, as shown in Figure 1.1. This process consists of a series of trans- formation steps, from data preprocessing to postprocessing of data mining results. Input Data Feature Selection Dimensionality Reduction Normalization Data Subsetting
  • 51. Information Filtering Patterns Visualization Pattern Interpretation Figure 1.1. The process of knowledge discovery In databases (KDO). The input dat,a can be stored in a variety of formats (flat files, spread- sheets, or relational tables) and may reside in a centralized data repository or be dist,r ibu ted across multip le sites. The pu rpose of p r eprocessing is to transform the raw input data into an appropriate format for subsequent analysis. The steps involved in data preprocessing include fusing data from multip le sources, cleaning data to remove noise and duplicate observations, and selecting records and features t hat are relevant to t he data mining task at hand. Because of the many ways data can be collected and stored, data 4 Chapter 1 Introduction preprocessing is perhaps the most laborious and time-consuming step in the overall knowledge discovery process. "Closing the loop" is the phrase often used to refer to the process of in-
  • 52. tegrating data mining results into decision support systems. For example, in business applications, the insights offered by data mining results can be integrated with campaign management tools so that effective marketing pro- motions can be conducted and tested. Such integration requires a …