Arrays In Python | Python Array Operations | EdurekaEdureka!
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on 'Arrays in Python' will help you establish a strong hold on all the fundamentals in the Python programming language. Below are the topics covered in this PPT:
What is an array?
Is python list same as an array?
How to create arrays in python?
Accessing array elements
Basic array operations
- Finding the length of an array
- Adding Elements
- Removing elements
- Array concatenation
- Slicing
- Looping
Python Tutorial Playlist: https://goo.gl/WsBpKe
Blog Series: http://bit.ly/2sqmP4s
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Arrays In Python | Python Array Operations | EdurekaEdureka!
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on 'Arrays in Python' will help you establish a strong hold on all the fundamentals in the Python programming language. Below are the topics covered in this PPT:
What is an array?
Is python list same as an array?
How to create arrays in python?
Accessing array elements
Basic array operations
- Finding the length of an array
- Adding Elements
- Removing elements
- Array concatenation
- Slicing
- Looping
Python Tutorial Playlist: https://goo.gl/WsBpKe
Blog Series: http://bit.ly/2sqmP4s
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Python- Creating Dictionary,
Accessing and Modifying key: value Pairs in Dictionaries
Built-In Functions used on Dictionaries,
Dictionary Methods
Removing items from dictionary
Spock the enterprise ready specifiation framework - Ted VinkeTed Vinke
Spock is a testing and specification framework for Java and Groovy applications. What makes it stand out from the crowd is its beautiful and highly expressive specification language. Thanks to its JUnit runner, Spock is compatible with most IDEs, build tools, and continuous integration servers. Spock is inspired from JUnit, jMock, RSpec, Groovy, Scala, Vulcans, and other fascinating life forms.
This presentation is an overview of the Spock 1.1 documentation available at http://docs.spockframework.org/
Connect python application with postgreSQL database using psycopg2.
Perform DDL & DML operations
Create table , Insert/update/delete and select records
Understand connection & cursor class
This session for beginners introduces tf.data APIs for creating data pipelines by combining various "lazy operators" in tf.data, such as filter(), map(), batch(), zip(), flatmap(), take(), and so forth.
Familiarity with method chaining and TF2 is helpful (but not required). If you are comfortable with FRP, the code samples in this session will be very familiar to you.
Scala er et Java-relateret, statisk typet programmeringssprog i hastig fremmarch. Sproget kombinerer aspekter fra objekt- og funktionsorienterede sprog og fokuserer på skalerbarhed og effektivitet, både på det kodemæssige og afviklingsmæssige niveau. Syntaksen er elegant og koncis. Samtidig indeholder sproget stærke konstruktioner til understøttelse af parallelle applikationer, der udnytter fremtidens hardwarearkitekturer.
Java som sprog har ikke bevæget sig meget de seneste år. Vi har stadig ikke closures eller funktionelle aspekter som f.eks. C# har haft siden version 3. Er Scala svaret på enhver Javaudviklers bønner eller er sproget kun interessant for tågehoveder som mig, som begynder at synes bedre og bedre om funktionsorientering? Er den store portion syntaktisk sukker, Scala bringer på bordet, bare tomme kalorier?
Using the Arrow Library to increase the Functional Programming potential of the Kotlin Language. Presented to the Cork Java Users Group and Cork Functional Programmers on 1st May 2018
Python- Creating Dictionary,
Accessing and Modifying key: value Pairs in Dictionaries
Built-In Functions used on Dictionaries,
Dictionary Methods
Removing items from dictionary
Spock the enterprise ready specifiation framework - Ted VinkeTed Vinke
Spock is a testing and specification framework for Java and Groovy applications. What makes it stand out from the crowd is its beautiful and highly expressive specification language. Thanks to its JUnit runner, Spock is compatible with most IDEs, build tools, and continuous integration servers. Spock is inspired from JUnit, jMock, RSpec, Groovy, Scala, Vulcans, and other fascinating life forms.
This presentation is an overview of the Spock 1.1 documentation available at http://docs.spockframework.org/
Connect python application with postgreSQL database using psycopg2.
Perform DDL & DML operations
Create table , Insert/update/delete and select records
Understand connection & cursor class
This session for beginners introduces tf.data APIs for creating data pipelines by combining various "lazy operators" in tf.data, such as filter(), map(), batch(), zip(), flatmap(), take(), and so forth.
Familiarity with method chaining and TF2 is helpful (but not required). If you are comfortable with FRP, the code samples in this session will be very familiar to you.
Scala er et Java-relateret, statisk typet programmeringssprog i hastig fremmarch. Sproget kombinerer aspekter fra objekt- og funktionsorienterede sprog og fokuserer på skalerbarhed og effektivitet, både på det kodemæssige og afviklingsmæssige niveau. Syntaksen er elegant og koncis. Samtidig indeholder sproget stærke konstruktioner til understøttelse af parallelle applikationer, der udnytter fremtidens hardwarearkitekturer.
Java som sprog har ikke bevæget sig meget de seneste år. Vi har stadig ikke closures eller funktionelle aspekter som f.eks. C# har haft siden version 3. Er Scala svaret på enhver Javaudviklers bønner eller er sproget kun interessant for tågehoveder som mig, som begynder at synes bedre og bedre om funktionsorientering? Er den store portion syntaktisk sukker, Scala bringer på bordet, bare tomme kalorier?
Using the Arrow Library to increase the Functional Programming potential of the Kotlin Language. Presented to the Cork Java Users Group and Cork Functional Programmers on 1st May 2018
Big Data Logging Pipeline with Apache Spark and KafkaDogukan Sonmez
How to ship huge amount of log data through big data pipeline which is built by apache spark, kafka and elasticsaerch.
Challenges while running this pipeline on production.
The slides give an overview of how Spark can be used to tackle Machine learning tasks, such as classification, regression, clustering, etc., at a Big Data scale.
Shaders - Claudia Doppioslash - Unity With the BestBeMyApp
Shader programming is one of the things that most influences how good your game will look, yet it's perceived as a black art, hidden away and feared.
In this talk, Claudia described:
1. How shader programming works
2. How Unity lets you take almost full control of the shader subsystem
3. What you can achieve with that control
4. How to implement a custom Physically Based Lighting system and the logic behind every choice
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Chris Fregly
Empowering the Data Scientist with "1-Click" Production Deployment and Canary Testing of High-Performance and Highly-Scalable Spark ML and TensorFlow Models directly from Jupyter/iPython Notebooks using Docker, Kubernetes, Netflix OSS, Microservices, and Spinnaker.
With proper tooling and metrics, Data Scientists can directly deploy, analyze, A/B test, rollback, and scale out their Spark ML and TensorFlow model into live production serving with zero friction.
We will show you the open source tools that we've built based on Docker, Kubernetes, Netflix Open Source, Microservices, Spinnaker - and even Chaos Monkey!
Speaker: Chris Fregly @ PipelineIO, formerly Databricks and Netflix
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn? At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting; which would you use in production?
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn?
At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting -- in several different frameworks. We'll show what it's like to work with native Spark.ml, and compare it to scikit-learn along several dimensions: ease of use, productivity, feature set, and performance.
In some ways Spark.ml is still rather immature, but it also conveys new superpowers to those who know how to use it.
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit
The central premise of DataXu is to apply data science to better marketing. At its core, is the Real Time Bidding Platform that processes 2 Petabytes of data per day and responds to ad auctions at a rate of 2.1 million requests per second across 5 different continents. Serving on top of this platform is Dataxu’s analytics engine that gives their clients insightful analytics reports addressed towards client marketing business questions. Some common requirements for both these platforms are the ability to do real-time processing, scalable machine learning, and ad-hoc analytics. This talk will showcase DataXu’s successful use-cases of using the Apache Spark framework and Databricks to address all of the above challenges while maintaining its agility and rapid prototyping strengths to take a product from initial R&D phase to full production. The team will share their best practices and highlight the steps of large scale Spark ETL processing, model testing, all the way through to interactive analytics.
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. During this workshop, members of the Amazon Machine Learning team will provide a short background on Deep Learning focusing on relevant application domains and an introduction to using the powerful and scalable Deep Learning framework, MXNet. At the end of this tutorial you’ll gain hands on experience targeting a variety of applications including computer vision and recommendation engines as well as exposure to how to use preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development.
In this presentation, You will get to know about Function Literal,Higher Order Function,Partial Function,Partial Applied Function,Nested Function,Closures.
Computers and Programming , Programming Languages Types, Problem solving, Introduction to the MATLAB environment, Using MATLAB Documentation
Introduction to the course, Operating methodology-Installation Procedure
1. Compare a sample code in C with MATLAB
2. Trajectory of a particle in projectile motion ( solving quadratic equations)
3. Ideal gas law problem to find volume
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
2. Inspired by
201503 Email Classifier using Mahout on Hadoop
● Dataset from Apache Spam Assassin
o One file per email, with mail headers and HTML tags
o #spam = 501, #ham = 2501
● Output of Confusion Matrix
Actual Predict spam ham
spam 69 TP 1 FN Recall = 98.5714%
ham 1 FP 382 TN
Precision = 98.5714% Accuacy = 99.5585%
3. Spam Sample
From 12a1mailbot1@web.de Thu Aug 22 13:17:22 2002
Return-Path: <12a1mailbot1@web.de>
Delivered-To: zzzz@localhost.spamassassin.taint.org
Received: from localhost (localhost [127.0.0.1])
by phobos.labs.spamassassin.taint.org (Postfix) with ESMTP id 136B943C32
for <zzzz@localhost>; Thu, 22 Aug 2002 08:17:21 -0400 (EDT)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Dwindows-1252" http-equiv=3DContent-T=ype>
<META content=3D"MSHTML 5.00.2314.1000" name=3DGENERATOR></HEAD>
<BODY><!-- Inserted by Calypso -->
<TABLE border=3D0 cellPadding=3D0 cellSpacing=3D2 id=3D_CalyPrintHeader_ r=
<CENTER>Save up to 70% on Life Insurance.</CENTER></FONT><FONT color=3D#ff=
0000
face=3D"Copperplate Gothic Bold" size=3D5 PTSIZE=3D"10">
<CENTER>Why Spend More Than You Have To?
<CENTER><FONT color=3D#ff0000 face=3D"Copperplate Gothic Bold" size=3D5 PT=
SIZE=3D"10">
<CENTER>Life Quote Savings
Email headers
HTML tags
Email body
4. val tf = new HashingTF(numFeatures = 100)
val spam = sc.wholeTextFiles("file:///home/leo/spam/20030228.spam")
val ham = sc.wholeTextFiles("file:///home/leo/spam/20030228.easyham")
val spamTrain = spam.map{ case (file, text) => tf.transform(text.split(" "))
}.map(features => LabeledPoint( 1, features))
val hamTrain = ham.map{ case (file, text) => tf.transform(text.split(" "))
}.map(features => LabeledPoint( 0, features))
val sampleData = spamTrain ++ hamTrain
sampleData.cache()
val trainData = sampleData.sample(false, 0.85, 707L)
val testData = sampleData.sample(false, 0.15, 707L)
val lrLearner = new LogisticRegressionWithSGD()
val model = lrLearner.run(trainData)
Featurization
Using Spark Mlib (1)
#samples=3002
#trainData=2549
#spam=431, #ham=2118
#testData=431
#spam=73, #ham=358
AccracyBase= 83.0626% ( (0+358)/431 )
Tokenization
training
5. Using Spark Mlib (2)
val validation = testData.map{ lpoint => (lpoint.label, model.predict(lpoint.features)) }
val matirx = validation.map{
ret => ret match {
case (1.0, 1.0) => Array(1, 0, 0, 0) // TP
case (0.0, 1.0) => Array(0, 1, 0, 0) // FP
case (0.0, 0.0) => Array(0, 0, 1, 0) // TN
case (1.0, 0.0) => Array(0, 0, 0, 1) // FN
}
}.reduce{
(ary1, ary2) => Array(ary1(0)+ary2(0), ary1(1)+ary2(1), ary1(2)+ary2(2), ary1(3)+ary2(3))
}
matrix:Array[Int] = Array(37TP, 11FP, 347TN, 36FN)
Accuracy = 89.0951% ( (37+347)/431 ) , vs. 99.5585% using Mahout
Precision = 77.0833% ( 37/(37+11) ) , vs. 98.5714% using Mahout
Recall = 50.6849% ( 37/(37+36) ) , vs. 98.5714% using Mahout
validation
6. Model Parameters
class org.apache.spark.mllib.feature.HashingTF
● val numFeatures: Int
number of features (default: 220)
class LogisticRegressionWithSGD
● val optimizer: GradientDescent
The optimizer to solve the problem.
class GradientDescent
● def setNumIterations(iters: Int): GradientDescent.this.type
Set the number of iterations for SGD.
● def setRegParam(regParam: Double): GradientDescent.this.type
Set the regularization parameter.
How to find the best
combination of each
parameter?
7. ML Pipeline Concepts
Transformer
A feature transformer might take a dataset, read a column (e.g., text), convert it into a
new column (e.g., feature vectors)
A learning model might take a dataset, read the column containing feature vectors,
predict the label for each feature vector, append the labels as a new column.
Estimators
An Estimator abstracts the concept of a learning algorithm or any algorithm which fits or
trains on data.
Pipeline
A Pipeline is specified as a sequence of stages, and each stage is either a Transformer
or an Estimator. These stages are run in order, and the input dataset is modified as it
passes through each stage.
Spark 1.3.0 ML Programming Guide
9. Using ML Pipeline (1)
case class Email(text:String)
case class EmailLabeled(text:String, label:Double)
val spamTrain = sc.wholeTextFiles("file:///home/leo/spam/20030228.spam").map {
case (file, content) => EmailLabeled(content, 1.0)}
val hamTrain = sc.wholeTextFiles("file:///home/leo/spam/20030228.easyham").map {
case (file, content) => EmailLabeled(content, 0.0) }
val sampleSet = (spamTrain ++ hamTrain).toDF()
sampleSet.cache()
val trainSet = sampleSet.sample(false, 0.85, 100L)
val testSet = sampleSet.sample(false, 0.15, 100L)
val tokenizer = new Tokenizer().setInputCol("text").setOutputCol("words")
val hashingTF = new HashingTF().setInputCol(tokenizer.getOutputCol).setOutputCol("features")
val lr = new LogisticRegression().setMaxIter(10)
val pipeline = new Pipeline().setStages( Array(tokenizer, hashingTF, lr) )
val crossval = new CrossValidator().setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator)
val paramGrid = new ParamGridBuilder(
).addGrid( hashingTF.numFeatures, Array(10, 100, 1000)
).addGrid( lr.regParam, Array(0.1, 0.01)
).addGrid( lr.maxIter, Array(10, 20, 30, 50)
).build()
#samples=3002
#trainData=2528
#spam=421, #ham=2107
#testData=437
#spam=84, #ham=353
AccracyBase
= 80.7780% ( (0+353)/437 )
10. Using ML Pipeline (2)
crossval.setEstimatorParamMaps(paramGrid).setNumFolds(3)
val cvModel = crossval.fit(trainSet)
val validation = cvModel.transform(testSet)
val matrix = validation.select("label","prediction").map{
case Row(label: Double, prediction: Double) => (label, prediction) match {
case (1.0, 1.0) => Array(1, 0, 0, 0) // TP
case (0.0, 1.0) => Array(0, 1, 0, 0) // FP
case (0.0, 0.0) => Array(0, 0, 1, 0) // TN
case (1.0, 0.0) => Array(0, 0, 0, 1) // FN
}
}.reduce{
(ary1, ary2) => Array(ary1(0)+ary2(0), ary1(1)+ary2(1), ary1(2)+ary2(2), ary1(3)+ary2(3))
}
matrix:Array[Int] = Array(84TP, 1FP, 352TN, 0FN)
Accuracy = 99.7712% ( (84+352)/437 ) , vs. 99.5585% using Mahout
Precision = 98.8235% ( 84/(84+1) ) , vs. 98.5714% using Mahout
Recall = 100% ( 84/(84+0) ) , vs. 98.5714% using Mahout
All in One
Tokenization, Featurization, Model Training, Model Validation, and Prediction
cvModel.bestModel.fittingParamMap = {
LogisticRegression-3cb51fc7-maxIter: 20,
HashingTF-cb518e45-numFeatures: 1000,
LogisticRegression-3cb51fc7-regParam: 0.1 }