الموعد الإثنين 03 يناير 2022
143
مبادرة
#تواصل_تطوير
المحاضرة ال 143 من المبادرة
المهندس / محمد الرافعي طرباي
نقيب المبرمجين بالدقهلية
بعنوان
"IT INDUSTRY"
How To Getting Into IT With Zero Experience
وذلك يوم الإثنين 03 يناير2022
السابعة مساء توقيت القاهرة
الثامنة مساء توقيت مكة المكرمة
و الحضور من تطبيق زووم
https://us02web.zoom.us/meeting/register/tZUpf-GsrD4jH9N9AxO39J013c1D4bqJNTcu
علما ان هناك بث مباشر للمحاضرة على القنوات الخاصة بجمعية المهندسين المصريين
ونأمل أن نوفق في تقديم ما ينفع المهندس ومهمة الهندسة في عالمنا العربي
والله الموفق
للتواصل مع إدارة المبادرة عبر قناة التليجرام
https://t.me/EEAKSA
ومتابعة المبادرة والبث المباشر عبر نوافذنا المختلفة
رابط اللينكدان والمكتبة الالكترونية
https://www.linkedin.com/company/eeaksa-egyptian-engineers-association/
رابط قناة التويتر
https://twitter.com/eeaksa
رابط قناة الفيسبوك
https://www.facebook.com/EEAKSA
رابط قناة اليوتيوب
https://www.youtube.com/user/EEAchannal
رابط التسجيل العام للمحاضرات
https://forms.gle/vVmw7L187tiATRPw9
ملحوظة : توجد شهادات حضور مجانية لمن يسجل فى رابط التقيم اخر المحاضرة
Recently, in the fields Business Intelligence and Data Management, everybody is talking about data science, machine learning, predictive analytics and many other “clever” terms with promises to turn your data into gold. In this slides, we present the big picture of data science and machine learning. First, we define the context for data mining from BI perspective, and try to clarify various buzzwords in this field. Then we give an overview of the machine learning paradigms. After that, we are going to discuss - at a high level - the various data mining tasks, techniques and applications. Next, we will have a quick tour through the Knowledge Discovery Process. Screenshots from demos will be shown, and finally we conclude with some takeaway points.
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
The slide contains some high level information about some machine learning algorithms, cross validation and feature extraction techniques. It also contains high level techniques about high available and scalable ML products.
الموعد الإثنين 03 يناير 2022
143
مبادرة
#تواصل_تطوير
المحاضرة ال 143 من المبادرة
المهندس / محمد الرافعي طرباي
نقيب المبرمجين بالدقهلية
بعنوان
"IT INDUSTRY"
How To Getting Into IT With Zero Experience
وذلك يوم الإثنين 03 يناير2022
السابعة مساء توقيت القاهرة
الثامنة مساء توقيت مكة المكرمة
و الحضور من تطبيق زووم
https://us02web.zoom.us/meeting/register/tZUpf-GsrD4jH9N9AxO39J013c1D4bqJNTcu
علما ان هناك بث مباشر للمحاضرة على القنوات الخاصة بجمعية المهندسين المصريين
ونأمل أن نوفق في تقديم ما ينفع المهندس ومهمة الهندسة في عالمنا العربي
والله الموفق
للتواصل مع إدارة المبادرة عبر قناة التليجرام
https://t.me/EEAKSA
ومتابعة المبادرة والبث المباشر عبر نوافذنا المختلفة
رابط اللينكدان والمكتبة الالكترونية
https://www.linkedin.com/company/eeaksa-egyptian-engineers-association/
رابط قناة التويتر
https://twitter.com/eeaksa
رابط قناة الفيسبوك
https://www.facebook.com/EEAKSA
رابط قناة اليوتيوب
https://www.youtube.com/user/EEAchannal
رابط التسجيل العام للمحاضرات
https://forms.gle/vVmw7L187tiATRPw9
ملحوظة : توجد شهادات حضور مجانية لمن يسجل فى رابط التقيم اخر المحاضرة
Recently, in the fields Business Intelligence and Data Management, everybody is talking about data science, machine learning, predictive analytics and many other “clever” terms with promises to turn your data into gold. In this slides, we present the big picture of data science and machine learning. First, we define the context for data mining from BI perspective, and try to clarify various buzzwords in this field. Then we give an overview of the machine learning paradigms. After that, we are going to discuss - at a high level - the various data mining tasks, techniques and applications. Next, we will have a quick tour through the Knowledge Discovery Process. Screenshots from demos will be shown, and finally we conclude with some takeaway points.
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
The slide contains some high level information about some machine learning algorithms, cross validation and feature extraction techniques. It also contains high level techniques about high available and scalable ML products.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
High time to add machine learning to your information security stackMinhaz A V
Machine learning might never be the silver bullet for cybersecurity compared to areas where it is thriving. There will always be a person who tries to find issues in our systems and bypass them. They may even use it to assist the attacks.
But adding it to our general information security stack can surely help us be more prepared while defending. Different categories like regression, classification, clustering, recommendations & reinforcement learning can be leveraged to build efficient & faster monitoring, threat response, network traffic analysis and more.
Along with introduction to different aspects and how it can be leveraged - I'd like to present a case study on how ML/AI can be used in distinguishing between benign and Malicious traffic data by means of anomaly detection techniques with 100% True Positive Rate with live demo.
Cluster analysis is a statistical method used to group similar objects into respective categories. It is also known as taxonomy analysis, segmentation analysis, and clustering.
https://www.infosectrain.com/courses/data-science-with-python-and-r-certification-training/
What is cluster analysis in data science?
Cluster analysis is a statistical method used to group similar objects into respective categories. It is also known as taxonomy analysis, segmentation analysis, and clustering. It is based on the method of grouping or categorizing data points in a certain dataset. It classifies data into distinct groups called clusters based on shared characteristics.
You can watch: https://www.youtube.com/watch?v=TAnOlBQLTqc
Workshop given to the staff for PhD and Masters Topic Selection in the area of Big Data, Data Science and Machine Learning. It has many interactive online demos to understanding on NLP social media analysis like sentiment analysis , topic modeling , language detection and intent detection. Some of the basic concept about classification and regression and clustering with interactive worksheets. Finally , hands-on machine learning models and comparisons in WEKA tool kit with case study of cars and diabetic patient data.
A (vintage) presentation about a database system for the study of gene expression data. Including distributed metadata annotation and some interactive analytics. Some ideas are still actual today.
Recent papers from the NIPS 2015 workshop on feature extraction suggest that representational learning consisting of "supervised coupled" methods (such as the training of supervised deep neural networks) can significantly improve classification accuracy vis a vis unsupervised and/or uncoupled methods. Such methods jointly learn a representation function and a labeling function. If you are a machine learning practitioner in a field whose applications demand or require strict interpretability constraints, a major drawback of using deep neural networks is that they are notoriously difficult to interpret. In this talk, Alex will discuss "distilled learning" -- training a classifier and extracting its outputs for use as training labels for another model -- and "dark knowledge" -- implicit knowledge of the underlying data representation learned by a classifier. Together, Alex will show their efficacy in improving classification accuracy in more readily interpretable models such as single decision tree and logistic regression learners. Finally, Alex will discuss applications such as health sciences, credit decisions, and fraud detection.
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya
Introduction to Statistical and Machine Learning. Explains basics of ML, fundamental concepts of ML, Statistical Learning and Deep Learning. Recommends the learning sources and techniques of Machine Learning. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
BIG DATA AND MACHINE LEARNING
Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size.
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesn’t suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the users’ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
High time to add machine learning to your information security stackMinhaz A V
Machine learning might never be the silver bullet for cybersecurity compared to areas where it is thriving. There will always be a person who tries to find issues in our systems and bypass them. They may even use it to assist the attacks.
But adding it to our general information security stack can surely help us be more prepared while defending. Different categories like regression, classification, clustering, recommendations & reinforcement learning can be leveraged to build efficient & faster monitoring, threat response, network traffic analysis and more.
Along with introduction to different aspects and how it can be leveraged - I'd like to present a case study on how ML/AI can be used in distinguishing between benign and Malicious traffic data by means of anomaly detection techniques with 100% True Positive Rate with live demo.
Cluster analysis is a statistical method used to group similar objects into respective categories. It is also known as taxonomy analysis, segmentation analysis, and clustering.
https://www.infosectrain.com/courses/data-science-with-python-and-r-certification-training/
What is cluster analysis in data science?
Cluster analysis is a statistical method used to group similar objects into respective categories. It is also known as taxonomy analysis, segmentation analysis, and clustering. It is based on the method of grouping or categorizing data points in a certain dataset. It classifies data into distinct groups called clusters based on shared characteristics.
You can watch: https://www.youtube.com/watch?v=TAnOlBQLTqc
Workshop given to the staff for PhD and Masters Topic Selection in the area of Big Data, Data Science and Machine Learning. It has many interactive online demos to understanding on NLP social media analysis like sentiment analysis , topic modeling , language detection and intent detection. Some of the basic concept about classification and regression and clustering with interactive worksheets. Finally , hands-on machine learning models and comparisons in WEKA tool kit with case study of cars and diabetic patient data.
A (vintage) presentation about a database system for the study of gene expression data. Including distributed metadata annotation and some interactive analytics. Some ideas are still actual today.
Recent papers from the NIPS 2015 workshop on feature extraction suggest that representational learning consisting of "supervised coupled" methods (such as the training of supervised deep neural networks) can significantly improve classification accuracy vis a vis unsupervised and/or uncoupled methods. Such methods jointly learn a representation function and a labeling function. If you are a machine learning practitioner in a field whose applications demand or require strict interpretability constraints, a major drawback of using deep neural networks is that they are notoriously difficult to interpret. In this talk, Alex will discuss "distilled learning" -- training a classifier and extracting its outputs for use as training labels for another model -- and "dark knowledge" -- implicit knowledge of the underlying data representation learned by a classifier. Together, Alex will show their efficacy in improving classification accuracy in more readily interpretable models such as single decision tree and logistic regression learners. Finally, Alex will discuss applications such as health sciences, credit decisions, and fraud detection.
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya
Introduction to Statistical and Machine Learning. Explains basics of ML, fundamental concepts of ML, Statistical Learning and Deep Learning. Recommends the learning sources and techniques of Machine Learning. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
BIG DATA AND MACHINE LEARNING
Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size.
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesn’t suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the users’ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
Transfer learning (TL) is a research problem in machine learning (ML) that focuses on applying knowledge gained while solving one task to a related task
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
7. Supervised learning vs. Unsupervised learning
• Supervised learning: discover patterns in the data that relate data
attributes with a target (class labeled) attribute.
• These patterns are then utilized to predict the values of the target attribute in
future data instances.
• Unsupervised learning: The data have no target attribute.
• We want to explore the data to find some intrinsic structures in them.
• Classic supervised learning algorithm
• Classification
• Regression
7
9. What is Classification in Supervised Learning?
• Classification is where an algorithm is trained to classify input data on
discrete variables.
• During training, algorithms are given training input data with a class
label. For example, training data might consist of the last credit card
bills of a set of customers, labeled with whether they made a future
purchase or not.
• When a new customer’s credit balance is presented to the algorithm,
it classifies the customer to either will purchase or will not purchase
group.
9
10. What is Regression in Supervised Learning?
• Regression is a supervised learning method where an algorithm is
trained to predict an output from a continuous range of possible
values. For example, real estate training data would take note of the
location, area, and other relevant parameters. The output is the price
of the specific real estate.
• In regression, an algorithm needs to identify a functional relationship
between the input parameters and the output.
• The output value is not discrete like in classification, instead it is a
function of the continuous outputs.
10
11. Real-life Applications of Classification
• Binary classification (Most companies use)
• Spam detection
• Churn prediction
• Conversion prediction
• Imbalanced Classification
• Fraud detection: In the labeled data set used for training, only a small number of
inputs are labeled as a fraud.
• Medical diagnostics: In a large pool of samples, ones with a positive case of a disease
might be far less.
• Multi-class Classification
• Face classification: Based on the training data, a model categorizes a photo and maps
it to a specific person.
• Email classification: Multi-class classification is used to segregate emails into various
categories – social, education, work, and family.
11
12. Real-life Applications of Regression
• Linear regression
• It can be used to predict values within a continuous
range, (e.g. sales, price forecasting) or classifying
them into categories (e.g. cat, dog - logistic
regression)
• Polynomial regression
• It is used for a more complex data set that will not
fit neatly into a linear regression. An algorithm is
trained with a complex, labeled data set that may
not fit well under a straight line regression.
12
15. Image Classification using Logistic Regression
• Embedder:
• Inception V3: Google’s Inception v3 model trained on ImageNet.
• SqueezeNet: Deep model for image recognition that achieves AlexNet-level
accuracy on ImageNet with 50x fewer parameter.
• VGG-16: 16-layer image recognition model trained on ImageNet.
• Vgg-19: 19-layer image recognition model trained on ImageNet.
• Painter: A model tained to predict painters from artwork images.
• DeepLoc: A model trained to analyze yeast cell images.
• openface: Face recognition model trained on FaceScrub and CASIA-WebFace
dataset.
• http://vintage.winklerbros.net/facescrub.html
15
16. ImageNet/ Inception V3
• ImageNet is an image database. The images in the database are
organized into a hierarchy, with each node of the hierarchy depicted
by hundreds and thousands of images.
• Sample size of Training data: 1 million
• Sample size of validation data: 50000
• Number of classes: 1000
16
19. Outlier Detection
• Many applications require being
able to decide whether a new
observation belongs to the same
distribution as existing
observations (it is an inlier), or
should be considered as different
(it is an outlier). Often, this ability
is used to clean real data sets.
19
Ref: https://scikit-learn.org/stable/modules/outlier_detection.html#outlier-detection
26. 26
Open model_build_on_preduction.ows
• Main concepts:
• Data exploration
• Feature Statistics
• Rank
• Data Preprocess
• Preprocess
• Data Split
• Data Sampler
• Model
• Tree/ Tree Viewer
• Save Model/ Load Model
• Test and Score
• Confusion Matrix
• Prediction
27. Homework
• Please attempt to apply the below-mentioned models to your own
binary dataset, and endeavor to identify the model with the optimal
performances as well as the most significant variables. (in next page)
• Furthermore, it is advised to elucidate the underlying factors of model;
you should include those subsections as below:
• Introduction to your dataset
• Data exploration
• Model evaluation
• Conclusion
• Use PPT with texts and illustrations to present your observations.
27