• Machine (ML) is a field of artificial intelligence that uses
statistical techniques to give computer systems the ability to
"learn" (e.g., progressively improve performance on a specific
task) from data, without being explicitly programmed.
Machine learning tasks
Machine learning tasks are typically classified into several broad categories:
• Supervised learning: The computer is presented with example inputs and their desired outputs, given by
a "teacher", and the goal is to learn a general rule that maps inputs to outputs. As special cases, the input
signal can be only partially available, or restricted to special feedback.
• Semi-supervised learning: The computer is given only an incomplete training signal: a training set with
some (often many) of the target outputs missing.
• Active learning: The computer can only obtain training labels for a limited set of instances (based on a
budget), and also has to optimize its choice of objects to acquire labels for. When used interactively, these
can be presented to the user for labeling.
• Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find
structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or
a means towards an end (feature learning).
• Reinforcement learning: Data (in form of rewards and punishments) are given only as feedback to the
program's actions in a dynamic environment, such as driving a vehicle or playing a game against an
Machine learning applications
Another categorization of machine learning tasks arises when one considers the desired output of a
• In classification, inputs are divided into two or more classes, and the learner must produce a model that
assigns unseen inputs to one or more (multi-label classification) of these classes. This is typically tackled
in a supervised way. Spam filtering is an example of classification, where the inputs are email (or other)
messages and the classes are "spam" and "not spam".
• In regression, also a supervised problem, the outputs are continuous rather than discrete.
• In clustering, a set of inputs is to be divided into groups. Unlike in classification, the groups are not
known beforehand, making this typically an unsupervised task.
• Density estimation finds the distribution of inputs in some space.
• Dimensionality reduction simplifies inputs by mapping them into a lower-dimensional space. Topic
modeling is a related problem, where a program is given a list of human language documents and is
tasked to find out which documents cover similar topics.
• Among other categories of machine learning problems, learning to learn learns its own inductive bias
based on previous experience. Developmental learning, elaborated for robot learning, generates its own
sequences (also called curriculum) of learning situations to cumulatively acquire repertoires of novel skills
through autonomous self-exploration and social interaction with human teachers and using guidance
mechanisms such as active learning, maturation, motor synergies, and imitation.
Relation to Data Mining
Machine learning and data mining often employ the same methods and overlap significantly,
but while machine learning focuses on prediction, based on known properties learned
from the training data, data mining focuses on the discovery of (previously) unknown
properties in the data (this is the analysis step of knowledge discovery in databases).
Data mining uses many machine learning methods, but with different goals; on the other
hand, machine learning also employs data mining methods as "unsupervised learning" or as
a preprocessing step to improve learner accuracy. Much of the confusion between these
two research communities (which do often have separate conferences and separate
journals, ECML PKDD being a major exception) comes from the basic assumptions they
work with: in machine learning, performance is usually evaluated with respect to the ability
to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the
key task is the discovery of previously unknown knowledge. Evaluated with respect to
known knowledge, an uninformed (unsupervised) method will easily be outperformed by
other supervised methods, while in a typical KDD task, supervised methods cannot be used
due to the unavailability of training data.
• Deep learning (also known as deep structured learning or hierarchical learning) is part of
a broader family of machine learning methods based on learning data representations, as
opposed to task-specific algorithms. Learning can also be supervised, semi-supervised or
• Deep learning architectures such as deep neural networks, deep belief networks and
recurrent neural networks have been applied to fields including computer vision,
speech recognition, natural language processing, audio recognition, social network
filtering, machine translation, bioinformatics, drug design, medical image analysis,
material inspection and board game programs, where they have produced results
comparable to and in some cases superior to human experts.
• Deep learning models are vaguely inspired by information processing and communication
patterns in biological nervous systems yet have various differences from the structural
and functional properties of biological brains (especially human brains), which make them
incompatible with neuroscience evidences.
Deep Learning (contd)
• Most modern deep learning models are based on an artificial neural network, although they can also
include propositional formulas or latent variables organized layer-wise in deep generative models such as
the nodes in deep belief networks and deep Boltzmann machines.
• In deep learning, each level learns to transform its input data into a slightly more abstract and composite
representation. In an image recognition application, the raw input may be a matrix of pixels; the first
representational layer may abstract the pixels and encode edges; the second layer may compose and
encode arrangements of edges; the third layer may encode a nose and eyes; and the fourth layer may
recognize that the image contains a face. Importantly, a deep learning process can learn which features to
optimally place in which level on its own. (Of course, this does not completely obviate the need for hand-
tuning; for example, varying numbers of layers and layer sizes can provide different degrees of
• The "deep" in "deep learning" refers to the number of layers through which the data is transformed.
More precisely, deep learning systems have a substantial credit assignment path (CAP) depth. The CAP
is the chain of transformations from input to output. CAPs describe potentially causal connections
between input and output. For a feedforward neural network, the depth of the CAPs is that of the network
and is the number of hidden layers plus one (as the output layer is also parameterized). For recurrent
neural networks, in which a signal may propagate through a layer more than once, the CAP depth is
potentially unlimited. No universally agreed upon threshold of depth divides shallow learning from deep
learning, but most researchers agree that deep learning involves CAP depth > 2. CAP of depth 2 has
been shown to be a universal approximator in the sense that it can emulate any function.
Beyond that more layers do not add to the function approximator ability of the network. Deep models
(CAP > 2) are able to extract better features than shallow models and hence, extra layers help in learning
• Artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by
machines, in contrast to the natural intelligence displayed by humans and other animals. In computer
science AI research is defined as the study of "intelligent agents": any device that perceives its
environment and takes actions that maximize its chance of successfully achieving its goals. Colloquially,
the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans
associate with other human minds, such as "learning" and "problem solving".
• Experts Systems
• Fuzzy Logic
• Robotics (e.g humanoid, arm, chatbot, etc)
• Natural Language Processing
• Neural Language (general)
• Some of the subfield of AI used/transformed itu Machine Learning/Deep Learning field, with the impact of
advance computational technology and huge amount of data (especially unstructured/ no row+column)
• Data science is an interdisciplinary field that uses scientific methods, processes,
algorithms and systems to extract knowledge and insights from data in various forms,
both structured and unstructured, similar to data mining.
• Data science is a "concept to unify statistics, data analysis, machine learning and their
related methods" in order to "understand and analyze actual phenomena" with data. It
employs techniques and theories drawn from many fields within the context of
mathematics, statistics, information science, and computer science.
• Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science
(empirical, theoretical, computational and now data-driven) and asserted that
"everything about science is changing because of the impact of information technology"
and the data deluge.
• Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal
of discovering useful information, informing conclusions, and supporting decision-making. Data analysis
has multiple facets and approaches, encompassing diverse techniques under a variety of names, while
being used in different business, science, and social science domains.
• Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery
for predictive rather than purely descriptive purposes, while business intelligence covers data analysis
that relies heavily on aggregation, focusing mainly on business information. In statistical applications,
data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory
data analysis (CDA). EDA focuses on discovering new features in the data while CDA focuses on
confirming or falsifying existing hypotheses. Predictive analytics focuses on application of statistical
models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and
structural techniques to extract and classify information from textual sources, a species of unstructured
data. All of the above are varieties of data analysis.
• Data integration is a precursor to data analysis,[according to whom?] and data analysis is closely
linked[how?] to data visualization and data dissemination. The term data analysis is sometimes used as a
synonym for data modeling.
• Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data
analysis of business information. BI technologies provide historical, current and predictive views of
business operations. Common functions of business intelligence technologies include reporting, online
analytical processing, analytics, data mining, process mining, complex event processing, business
performance management, benchmarking, text mining, predictive analytics and prescriptive analytics. BI
technologies can handle large amounts of structured to help identify, develop and otherwise create
new strategic business opportunities. They aim to allow for the easy interpretation of these data.
Identifying new opportunities and implementing an effective strategy based on insights can provide
businesses with a competitive market advantage and long-term stability.
• Business intelligence can be used by enterprises to support a wide range of business decisions
ranging from operational to strategic. Basic operating decisions include product positioning or pricing.
Strategic business decisions involve priorities, goals and directions at the broadest level. In all cases,
BI is most effective when it combines data derived from the market in which a company operates (external
data) with data from company sources internal to the business such as financial and operations data
(internal data). When combined, external and internal data can provide a complete picture
• Business Intelligence (structured data source, enterprise goal intensive, traditional process flow, result:
insight from previous data for executives) VS Big Data Analytics (unstructured/semi structured, wide-
range goals, modern aproach/flow, result: predictive, for executive and/or machine to decide next step)
• Big data is a term used to refer to data sets that are too large or complex for traditional data-
processing application software to adequately deal with. Data with many cases (rows) offer greater
statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false
discovery rate. Big data challenges include capturing data, data storage, data analysis, search,
sharing, transfer, visualization, querying, updating, information privacy and data source. Big data
was originally associated with three key concepts: volume, variety, and velocity. Other concepts later
attributed with big data are veracity (i.e., how much noise is in the data) and value.
• Current usage of the term "big data" tends to refer to the use of predictive analytics, user behavior
analytics, or certain other advanced data analytics methods that extract value from data, and seldom
to a particular size of data set. "There is little doubt that the quantities of data now available are indeed
large, but that’s not the most relevant characteristic of this new data ecosystem."Analysis of data sets
can find new correlations to" spot business trends, prevent diseases, combat crime and so
on."Scientists, business executives, practitioners of medicine, advertising and governments alike regularly
meet difficulties with large data-sets in areas including Internet search, fintech, urban informatics, and
business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,
connectomics, complex physics simulations, biology and environmental research.
Big Data Definition shiftingThe term has been in use since the 1990s, with some giving credit to John Mashey for popularizing the term. Big data usually
includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data
within a tolerable elapsed time. Big data philosophy encompasses unstructured, semi-structured and structured data, however
the main focus is on unstructured data.
Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many exabytes of data. Big data
requires a set of techniques and technologies with new forms of integration to reveal insights from datasets that are diverse,
complex, and of a massive scale.
A 2016 definition states that "Big data represents the information assets characterized by such a high volume, velocity and variety
to require specific technology and analytical methods for its transformation into value". Additionally, a new V, veracity, is added by
some organizations to describe it, revisionism challenged by some industry authorities. The three Vs (volume, variety and velocity)
have been further expanded to other complementary characteristics of big data:
• Machine learning: big data often doesn't ask why and simply detects patterns
• Digital footprint: big data is often a cost-free byproduct of digital interaction (e.g people interaction in Facebook)
A 2018 definition states "Big data is where parallel computing tools are needed to handle data", and notes, "This represents a
distinct and clearly defined change in the computer science used, via parallel programming theories, and losses of some of the
guarantees and capabilities made by Codd’s relational model". The growing maturity of the concept more starkly delineates the
difference between "big data" and "Business Intelligence":
• Business Intelligence uses descriptive statistics with data with high information density to measure things, detect trends, etc.
• Big data uses inductive statistics and concepts from nonlinear system identification to infer laws (regressions, nonlinear
relationships, and causal effects) from large sets of data with low information density to reveal relationships and dependencies,
or to perform predictions of outcomes and behaviors.
• Mendalami pemrograman dan teknik komputer/elektronik sejak 1991 hingga sekarang (-/+
• Lebih menyukai bahasa pemrograman, logika dan algoritma (problem solving)
• Pernah bekerja profesional bbrp perusahaan lintas industri (ISP, pabrik, broadcasting,
telekomunikasi, finance/finserv, buku/perpustakaan)
• Pernah berkarya sebagai dosen/pengajar dan konsultan lintas industri (agriculture,
aquafarming/coldstorage, telco, retail, manufacturer, human resources/capital, dsb)
• Sejak 2013 mulai mendalami ke dunia IoT, Drone/UAV, infosec, bigdata, machine learning,
• Aplikasi AI pertama pernah dibuat tahun 1992 menggunakan Turbo Prolog, voice
recognition (1992/93), polymorphic / heuristic code (1993), aplikasi Fuzzy Logic (1996/97),
dosen AI tahun 2008/10, mulai mempelajari map reduce/hadoop 2014.
• Berpartisipasi aktif dalam bbrp conference terkait Big Data, antara lain: Data For Public
Policy (international conference, PBB/Pulse Lab), Big Data Week (1st, 2nd), Precission
Agruculture projects (with CIagri), Jakarta Smart City (literacy index, BPAD DKI Jakarta),