TYPES OF PLACEMENT,GOOD PLACEMENT VS. BAD PLACEMENT ,ALGORITHMS, ASIC DESIGN FLOW DIAGRAM,DEFINITION OF PLACEMENT,TECHNIQUES USED FOR PLACEMENT,PLACEMENT TRENDS,SOLUTIONS
TYPES OF PLACEMENT,GOOD PLACEMENT VS. BAD PLACEMENT ,ALGORITHMS, ASIC DESIGN FLOW DIAGRAM,DEFINITION OF PLACEMENT,TECHNIQUES USED FOR PLACEMENT,PLACEMENT TRENDS,SOLUTIONS
https://technoelectronics44.blogspot.com/
GDI TECHNOLOGY, here you get GDI implementation and design of GDI based gates AND, OR, XOR, and Adders like CLA, CIA, CSKA, performance analysis of CMOS And GDI
The signal processing algorithms can be implemented on hardware using various strategies such as DSP processors and ASIC. This PPT compares and contrasts the two methods.
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...ijceronline
Embedded systems used in real-time applications require low power, less area and a high computation speed. For digital signal processing (DSP), image processing and communication applications, data are often received at a continuously high rate. Embedded processors have to cope with this high data rate and process the incoming data based on specific application requirements. Even though there are many different application domains, they all require arithmetic operations that quickly compute the desired values using a larger range of operation, reconfigurable behavior, low power and high precision. The type of necessary arithmetic operations may vary greatly among different applications. The RTL-based design and verification of one or more of these functions may be time-consuming. Some High Level Synthesis tools reduce this design and verification time but may not be optimal or suitable for low power applications. The developed MATLAB-based Arithmetic Engine improves design time and reduces the verification process, but the key point is to use a unified design that combines some of the basic operations with more complex operations to reduce area and power consumption. The results indicate that using the Arithmetic Engine from a simple design to more complex systems can improve design time by reducing the verification time by up to 62%. The MATLAB-based Arithmetic Engine generates structural RTL code, a testbench, and gives the designers more control. The MATLAB-based design and verification engine uses optimized algorithms for better accuracy at a better throughput.
Design and implementation of complex floating point processor using fpgaVLSICS Design
This paper presents complete processor hardware with three arithmetic units. The first arithmetic unit can
perform 32-bit integer arithmetic operations. The second unit can perform arithmetic operations such as
addition, subtraction, multiplication, division, and square root on 32-bit floating point numbers. The third
unit can perform arithmetic operations such as addition, subtraction, multiplication on complex numbers.
The specific advancement in this processor is the new architecture introduced for complex arithmetic unit.
In general complex floating point arithmetic hardware consists of floating to fixed and fixed to floating
conversions. But using such hardware will lead to compromise between accuracy and number of bits used
to represent the fixed point equivalent of floating point numbers. The proposed architecture avoids that
compromise and it is implemented with less number of look-up tables to save around 5500 logic gates. The
complex numbers are represented using a subset of IEEE754 standard floating point format, 16-bits for
real part and 16-bits for imaginary part. The floating point arithmetic unit works on 32-bit IEEE754 single
precision numbers. The instruction set is specially designed to support integer, floating point and complex
floating point arithmetic operations. The on-chip RAM is 8kBytes and is extendable up to 64kBytes. As the
processor is designed to implement on FPGA, the embedded block RAMs are utilized as RAM.
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...Silicon Mentor
Floating point numbers are used in various applications such as medical imaging, radar, telecommunications Etc. This paper deals with the comparison of various arithmetic modules and the implementation of optimized floating point ALU. For more info download this file or visit us at:
http://www.siliconmentor.com/
https://technoelectronics44.blogspot.com/
GDI TECHNOLOGY, here you get GDI implementation and design of GDI based gates AND, OR, XOR, and Adders like CLA, CIA, CSKA, performance analysis of CMOS And GDI
The signal processing algorithms can be implemented on hardware using various strategies such as DSP processors and ASIC. This PPT compares and contrasts the two methods.
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...ijceronline
Embedded systems used in real-time applications require low power, less area and a high computation speed. For digital signal processing (DSP), image processing and communication applications, data are often received at a continuously high rate. Embedded processors have to cope with this high data rate and process the incoming data based on specific application requirements. Even though there are many different application domains, they all require arithmetic operations that quickly compute the desired values using a larger range of operation, reconfigurable behavior, low power and high precision. The type of necessary arithmetic operations may vary greatly among different applications. The RTL-based design and verification of one or more of these functions may be time-consuming. Some High Level Synthesis tools reduce this design and verification time but may not be optimal or suitable for low power applications. The developed MATLAB-based Arithmetic Engine improves design time and reduces the verification process, but the key point is to use a unified design that combines some of the basic operations with more complex operations to reduce area and power consumption. The results indicate that using the Arithmetic Engine from a simple design to more complex systems can improve design time by reducing the verification time by up to 62%. The MATLAB-based Arithmetic Engine generates structural RTL code, a testbench, and gives the designers more control. The MATLAB-based design and verification engine uses optimized algorithms for better accuracy at a better throughput.
Design and implementation of complex floating point processor using fpgaVLSICS Design
This paper presents complete processor hardware with three arithmetic units. The first arithmetic unit can
perform 32-bit integer arithmetic operations. The second unit can perform arithmetic operations such as
addition, subtraction, multiplication, division, and square root on 32-bit floating point numbers. The third
unit can perform arithmetic operations such as addition, subtraction, multiplication on complex numbers.
The specific advancement in this processor is the new architecture introduced for complex arithmetic unit.
In general complex floating point arithmetic hardware consists of floating to fixed and fixed to floating
conversions. But using such hardware will lead to compromise between accuracy and number of bits used
to represent the fixed point equivalent of floating point numbers. The proposed architecture avoids that
compromise and it is implemented with less number of look-up tables to save around 5500 logic gates. The
complex numbers are represented using a subset of IEEE754 standard floating point format, 16-bits for
real part and 16-bits for imaginary part. The floating point arithmetic unit works on 32-bit IEEE754 single
precision numbers. The instruction set is specially designed to support integer, floating point and complex
floating point arithmetic operations. The on-chip RAM is 8kBytes and is extendable up to 64kBytes. As the
processor is designed to implement on FPGA, the embedded block RAMs are utilized as RAM.
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...Silicon Mentor
Floating point numbers are used in various applications such as medical imaging, radar, telecommunications Etc. This paper deals with the comparison of various arithmetic modules and the implementation of optimized floating point ALU. For more info download this file or visit us at:
http://www.siliconmentor.com/
Low Power VLSI design architecture for EDA (Electronic Design Automation) and Modern Power Estimation, Reduction and Fixing technologies including clock gating and power gating
Why you need a Partner led installed base programRiverview B2B
Are your Partner Programs heavily orientated towards acquisition ? Partner led installed base programs target retention.
Consider this...Existing customers are 50% more likely to try new products and spend 31% more, on average, when compared to new customers, which translates to cross sells and upsells.
The technology we use every day knows a lot about what we do—what we click on, where we go, and who we follow. But so far, it doesn’t know much about how we feel. That’s changing.
Emotion-sensing technology is moving from an experimental phase to reality. Maybe, our Internet things will start to understand us, cultivating emotional connections and picking up on social cues. What does it mean for how we design technology? This talk, grounded in the latest research and case studies, shows how designers can create rich, emotional experiences for the next wave of emotion-aware technology.
The Linguistic Secrets Found in Billions of Emoji - SXSW 2016 presentation SwiftKey
Every day, we send almost 6 billion emoji from our smartphones, but what kinds of patterns can you find when you look at all this data together? How do different cultures and nationalities use emoji differently? Are there hidden linguistic patterns in our quickly-dashed-off emoji utterances? Do emoji represent a fundamental shift away from old-fashioned word-based language or a return to a more flexible, pre-modern style of textual communication?
Join SwiftKey CTO Ben Medlock and internet linguist Gretchen McCulloch as they share never-before-seen insights based of billions of data points from people's real emoji use.
Find the full audio of this session on SoundCloud here: https://soundcloud.com/officialsxsw/the-linguistic-secrets-found-in-billions-of-emoji-sxsw-interactive-2016
This poster represents 4 months of work on the MSc project while doing a double degree at Heriot-Watt University.
£50 have been given for rewarding this work.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5Robert Grossman
This is a talk I gave in San Diego on July 29, 2009 explaining some of the impact and some of the opportunities of cloud computing on predictive analytics.
Web Oriented FIM for large scale dataset using Hadoopdbpublications
In large scale datasets, mining frequent itemsets using existing parallel mining algorithm is to balance the load by distributing such enormous data between collections of computers. But we identify high performance issue in existing mining algorithms [1]. To handle this problem, we introduce a new approach called data partitioning using Map Reduce programming model.In our proposed system, we have introduced new technique called frequent itemset ultrametric tree rather than conservative FP-trees. An investigational outcome tells us that, eradicating redundant transaction results in improving the performance by reducing computing loads.
From data ingestion, processing, model deployment to prediction - machine learning is hard! Join me to learn how serverless can make it all easier so you can stop worrying about the underlying infrastructure layer, and focus on getting the most value out of your data and development time.
Project PlanFor our Project Plan, we are going to develop.docxwkyra78
Project Plan
For our Project Plan, we are going to develop a new information system for a mid-size organization that will automate the payroll transactions. Since we already have all the information that the new system needed, we propose to use the Waterfall Development Methodology because we are going from one phase to the next.
Planning
System
Implementation
Design
Analysis
Estimated Projected Time Frame:
For Estimating the Project Time Frame, the project manager will develop a preliminary estimation of how long it will take to build the new system. There are several sources that the project manager uses to estimate the time frame. First is they can take it from projects that had similar tasks and technologies, an experienced developers can provided the estimation, or the type of methodology that is being used. It is a good practice to keep track of actual time and effort values during the SDLC so the data can be redefine and are used as a guide for future projects. For now, I will use the Industry standard to estimate the project time frame. Within the Industry standard for a typical business application system, they will spends 15% of effort in the planning phase, 20% in the analysis phase, 35% in the design phase, and 30% in the implementation phase. With the planning phase take 5 month to complete, then for the total project time is 33.3 person-month to complete the project.
Project time line
Planning
Analysis
Design
Implementation
15%
20%
35%
30%
5 person-month
6.66 person-month
11.66 person-month
10 person-month
Developing the Work Plan:
Once the project schedule has been established, the project manager can start creating a work plan for the project. The work plan is a schedule that he projects manager use to keep record and keep track of all the tasks that need to be accomplished over the entire project. The project manager will need to identify the entire task that are needed and determine how long each of the tasks will take. Then the task will be organized within the work breakdown structure. For the main task to be complete, the subtask has to be completed first.
Staffing the Project:
The project manager needs to figure out how many staff is needed for the project. The amount of staff is needed depend on how fast they want to finish the project. Increase in staff does not mean increases in productivity. If more staff is needed, make sure to have some kind of reporting structure. The project manager needs to know the staff capabilities and assign task according to their skills. Project manager needs to know how to motivate the staff for a project success.
Coordinating Project Activities:
The project manager needs to have activities put in place during the entire SDLC. Activities are tools that is use to ensure that he project stays on track and that the chance of failure is kept to a minimum. Case Tools (computer-aided software engineering), Standards, and Documentation are all activi ...
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...ijcsa
The work is about using Simulated Annealing Algorithm for the effort estimation model parameter
optimization which can lead to the reduction in the difference in actual and estimated effort used in model
development.
The model has been tested using OOP’s dataset, obtained from NASA for research purpose.The data set
based model equation parameters have been found that consists of two independent variables, viz. Lines of
Code (LOC) along with one more attribute as a dependent variable related to software development effort
(DE). The results have been compared with the earlier work done by the author on Artificial Neural
Network (ANN) and Adaptive Neuro Fuzzy Inference System (ANFIS) and it has been observed that the
developed SA based model is more capable to provide better estimation of software development effort than
ANN and ANFIS
Building machine learning muscle in your team & transitioning to make them do machine learning at scale. We also discuss about Spark & other relevant technologies.
Similar to Proposal for google summe of code 2016 (20)
Low Power Electronic design is basically compromised with power aware digital system designs techniques. Especially VLSI power architecture with advanced power reduction techniques are discussed in details here
Semiconductor engineering is becoming more dynamic fiels since the technology scaling is taking place. Power reduction techniques are lucrative solutions to the performance, area and power trade off. Therefore Power reduction of VLSI designs are critical.
VLSI power estimation is vital component of the modern electronic designs. Rapid changes in the advanced electronic infrastructure may causes the power to become paramount important in the VLSI designs.
Power gating is the main power reduction techniques for the static power. As long as technology scaling is taking place, static power becomes paramount important factor to the VLSI designs.Therefore Power gating is the recent power reduction technique that is actively in research areas.
Power reduction techniques are important for the modern VLSI designs. Power is the today's major concern when we come to optimal trade off between area, performance and power.
SOC, SOPC, MPSOC, ASIC are new VLSI approaches. Therefore power estimation and precise power calculation is vital to better understanding of Electronic world
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Final project report on grocery store management system..pdf
Proposal for google summe of code 2016
1. PROPOSAL 6: [ML] [CEP] PREDICTIVE ANALYTICS WITH ONLINE DATA FOR
WSO2 MACHINE LEARNER (ML)
(dananjayamahesh@gmail.com)
I am Mahesh Dananjaya, student from university of Moratuwa, the leading technical university
of the country. I am majoring electronic and telecommunication engineering and I have been
very enthusiastic to machine learning and big data analysis and also tried to implement more
optimized and high performance platforms to implement the networking, telecommunication,
signal processing and information processing.
I have done lots of projects related to machine intelligence and information processing
and data analysis. Some of my projects include development of SOPC (System on
Programmable Chip) for real time speech recognition on reconfigurable hardware, Artificial
neural network implementation for pattern recognition in speech signals, intrusion detection
application based on cluster analysis and several other projects in SVM, Regressions,
Discriminant analysis (DA) with Fisher discriminant and ML, factor analysis for real world
applications, principle component analysis (PCA) to reduce dimensionality in network
information for information security analysis. And I have been engaging with the Big Data
analysis research with the Dr. Prathapasinghe Dharmawansa from Stanford University
(Currently in University of Moratuwa). My area is to combine IOT, Information Processing
and machine learning areas. I have been specializing multivariate analysis in my mathematics
area.
1 INTRODUCTION
Machine learning and Big Data analysis is becoming a next breakthrough of
information processing and data analysis. Many organizations carried out many researches to
bring the big data analytics a reality and to build up some commercial level applications.
Nowadays there are many machine learning algorithms are there to come up with various
solutions to build up some intelligent information analysis to usual data flow. Most of the
network data is passing through high speed network lines as streams. Therefore the today’s
trend is to extend prevailing data analysis algorithms to support streaming and to provide the
same level of predictive data analytics to stream data also. Company like WSO2 is trying to
use those Big Data technologies and Machine Learning algorithms with their current product
such as Data Analysis servers (DAS) and CEP (Complex Event Processors) with Siddhi
2. processor by supporting streaming of data. The use of machine learning algorithms for data
analytics with the stream data support is an emerging technology.
2 PROBLEM STATEMENT
Even though WSO2 ML (Machine Learner) can be used to analyses cluster of data as
well as standalone data and it is still not supporting stream of data. Therefore it cannot be
directly used with the other products such as CEP (Complex Event Processor). And also WSO2
ML (Machine Learner) is using Apache Spark MLLib for their machine learning algorithms.
Spark MLLib is supporting its machine learning algorithms for streaming k-means and other
generalized linear models with the spark streaming by breaking of streaming data into mini
batches. But still that technology is supported only with the Scala API. Therefore it cannot be
directly used with the WS02 machine learning package.
3 PPROPOSAL
Proposed solution is to develop Java API for WSO2 ML with streaming data support and with
spark MLLib k-mean and generalized linear models algorithms. This can be done with the
Spark MLLib k-mean clustering and GLM (Generalized Linear Model) algorithms and by
periodically retraining the model with the newly arriving sequential data. This will be
implemented as a Siddhi extension that can operate directly on incoming streams that can be
used to perform a predictive analysis of data/event streams coming from the WS02 CEP
(Complex Event Processor). Thus we can use this for real time streamline data analysis by
providing better predictions.
4 METHODOLOGY
What we are going to do basically is that we can uses the same Apache Spark MLLib for
machine learning algorithms which we use for standalone data to analyze them and periodically
retrain the model with upcoming data come as data streams from several applications such as
WS02 CEP (Complex Event Processor)’s event streams. Therefore in this API I have to
develop the same behavior inside apache spark which we have to get mini-batches of data from
data coming as streams periodically by breaking of streaming data into mini batches and retrain
the model. We use mini-batch learning in my incremental learning approach since these
algorithms operates as stochastic gradient descents so that any learning with new data can be
done on top of the previously learned models.
3. Then we have to save the previous model parameters and re-train the model again with the
next set of mini-batch of data. Therefore in the process we should have to follow some basic
steps. First we need to break the streams into several pieces of data sets which called the mini-
batches and train a model repeatedly with those mini-batches. After each training run, we can
save the model information (such as weights, intercepts for regression and cluster centers for
k-mean clustering). As WS02 proposal page shows we use the first approach and try to
implement incremental learning algorithms based on the mini batches of data taken from the
data streams.
4.1 Mini-Batches from Streaming Data
First task is to taking pieces of data bundles out of streaming data which is known as
mini-batches to retrain model. Initially we looked into how spark streaming taking the mini
batches from streaming data. Basically Spark Streaming use the modified version of the
Lambda Architecture, known as the Kappa Architecture. In this architecture the output of a
fast, and often imprecise, streaming layer is joined with the output of a slow, but precise, batch
layer. Typically this takes the form of a series of scheduled batch processes that replace the
output of a continually running stream processor with a more accurate equivalent. This
provides a guarantee that any inaccurate data introduced by the stream processor will be
replaced by accurate data within a small window. In Spark Mini-Batch process, it defines
several other important parameters to coordinate the mini-batches. They are Zero Time and
Batch interval. The batch interval is the interval at which events that have been collected are
processed. The zero time is defined as the same time throughout the entire distributed system,
allowing Spark Streaming to use it to short-circuit computation of outputs from before inputs
were available.
4. Spark MLLib can then it its machine learning algorithms to with those mini batches. This image
shows how the spark mini batch sampling is happening. With the help of these methods, we
can train models again with newly arriving data, keeping the characteristics learned with the
previous data.
In the reference research paper of [1] in the references shows that how the mini batches and the
mini batch size b with the convergence time T can be used for the Stochastic Gradient Descent
(SGD) technique which can be effectively used for large scale optimization problems including
predictive data analysis. Mini batch training needs to be employed to reduce the
communication cost. However, an increase in mini batch size typically decreases the rate of
convergence. Since we have to use spark MLLib to get the retraining model and we have the
previous models saved in the program we can go for the SGD based optimization techniques
to find the incremental model for any learning algorithms, initially k-mean clustering and
Generalized Linear Regression models (GLM).
Therefore initially what we have to do is separate streaming data into set of data bundles
called mini batch with the appropriate mini batch size b which will be a paramount important
parameters to avoid
4.2 Running the Spark ML Algorithms and Save the Model parameters with Streaming
Data.
For each mini batch we can use spark MLLib to run the relevant machine learning algorithm,
in our case it is k mean clustering and generalized linear regression model (GLM). We have to
run the algorithm and save the model parameters and when the next batch comes run the
5. algorithm and get the new parameters and then we can combine the models and build the new
model with this incremental algorithms.
If we take the K mean clustering algorithm that is implemented on the spark streaming is
elaborated below.
4.3 Periodically Train the models with the previous models.
For the periodically retraining of the model can be achieved by several ways. As the [1]
research paper suggest and most of the people are using we can use mini-batch training with
some stochastic gradient descent optimization techniques. So we can train the model
periodically with mini batches.
Mini Batch Generalized Linear Regression
Especially when we comes to the Generalized Linear Regression we can use the stochastic
gradient descent techniques to find the new model based on the past models also. As the spark
streaming algorithms do referring to reference [5], we can use stochastic gradient descent
methods for incremental learning model with GLS. We need to keenly look into Regularization
of the parameters and along with various parameters associated with stochastic gradient descent
such as Step Size, Number of Iterations and Mini batch fraction to control some issues
regarding data horizon (how quickly a most recent data point becomes a part of the model) and
data obsolescence (how long does it take a past data point to become irrelevant to the model.
Description of how the algorithms run with the references [8], [9], [10] and [11]. There
those references describes the how it works and how we can use it with mini batch learning
algorithms. Reference [8] shows clearly and simply how the mini batch size can be selected
and how it can be parameterized b (B- batch size).
Mini Batch K Mean Clustering
When we comes to mini batch k mean clustering we can use the model parameters such
as cluster center to periodically retrain the model. Referring to [4] spark streaming algorithm
and Scala API we can write as following.
we may want to estimate clusters dynamically, updating them as new data arrive we
need to develop the API to support streaming k mean clustering with mini batches, with
parameters to control the decay (or “forgetfulness”) of the estimates. The algorithm uses a
generalization of the mini-batch k-means update rule. For each batch of data, we assign all
points to their nearest cluster, compute new cluster centers, then update each cluster using:
6. 𝐶𝑡+1 =
𝐶𝑡 𝑛 𝑡 𝛼 + 𝑋𝑡 𝑚 𝑡
𝑛 𝑡 𝛼 + 𝑚 𝑡
Where, 𝑛 𝑡+1 = 𝑛 𝑡 + 𝑚 𝑡
Where 𝐶𝑡the previous center for the cluster is, 𝑛 𝑡is the number of points assigned to the cluster
thus far, 𝑥𝑡 is the new cluster center from the current batch, and 𝑚 𝑡 is the number of points
added to the cluster in the current batch. The decay factor α can be used to ignore the past: with
α=1 all data will be used from the beginning; with α=0 only the most recent data will be used.
This is analogous to an exponentially-weighted moving average. These decaying factors make
more control upon data horizon and data obsolesces to control the mini batch process.
5 ARCHITECTURE OVERVIEW
Streaming data in accepted inside the core. Those streams may be coming from the WSO2 CEP
(Complex Event Processor), data streams coming out from the Siddhi processor, or else it can
be another streamline of data. Then the stream of data I break into mini batches and passes into
the API core where the algorithms run and store the models derived with the past models.
Apache Spark is used to run the machine learning algorithms that we need to run for each and
every mini batch of data.
Receiver
API Core
Apache Spark MLLib
Past
ModelsSTREAMS MINI-BATCH
7. 6 Time Line of the Project
April 22- May 22 Community bonding period - Getting familiar with the platform and
discussing implementation methods.
May 23 – June 18 Implementing streaming generalized linear regression and test for
streams
June 19 - June 21 Implementing a part of streaming k-means
June 21-June 28 Midterm Evaluation
June 28-July 17 Implementing the other part of the streaming k means clustering and test
and test streaming GLR further with streams.
July 18-August 12 Integrating The Java API with the CEP Siddhi processor streams and
test with the CEP event streams and other streams.
August 13– August
16
Final Documentation of the project
August 16-August 24 Submit Codes and documentations and evaluations
7 References
[1] https://www.cs.cmu.edu/~muli/file/minibatch_sgd.pdf
[2] https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
[3]http://crediblecode.net/z-featureditems/featured-1/sparking-insights
[4]http://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means
[5]http://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression
[6] https://papers.nips.cc/paper/4432-better-mini-batch-algorithms-via-accelerated-gradient-
methods.pdf
[7] https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
[8] http://stats.stackexchange.com/questions/140811/how-large-should-the-batch-size-be-for-
stochastic-gradient-descent
[9] http://www.cs.cmu.edu/~muli/file/minibatch_sgd.pdf
[10] http://arxiv.org/pdf/1106.4574.pdf
[11] http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf