This document summarizes a study of mining email social networks in open source software projects. It finds that developers who are most active in communication and coordination on mailing lists, as measured by social network metrics like in-degree and betweenness centrality, tend to be the most active contributors to the source code as well. Communication and coordination activity is strongly correlated with software development work. The study analyzed aliases, clustered identities, and relationships between developers' email and code contribution activities.
Information Systems Security3Information Systems Secur.docxjaggernaoma
Information Systems Security3
Information Systems Security
1.Collect the e-mails and view the e-mail header information in your e-mail program.
Spam refers to unsolicited email you don’t want. The most obvious examples of spam are unsolicited commercial emails, such as ads for porn, drugs, or body enhancement products.
There are two significant qualifications to spam:
You didn’t ask for it. An email that offers college degrees or cheaper mortgages from a person or a business that you’ve never communicated with would probably qualify as spam.
You don’t want it. When you receive it, you’re likely to delete it unread based on the subject line.
Spam is tricky. Some email programs and services automatically filter spam based on common key words, the number of people the message is being sent to, or the sender’s reputation. Some also allow you to flag messages as spam.
Unfortunately, any email that people don’t want runs the risk of being marked as spam. If an email newsletter that you signed up for changes its focus into something you don’t want, it might legitimately be considered spam.
X-Original Arrival Time: is the time the message was submitted to Hotmail … in other words, the time I pressed “Send”. Headers that begin with “X-” are “nonstandard”, and may not be used by all mailers. They’re often just informational. Note also the date and time: 13 May 2005 21:33:53.0097 (UTC). The “(UTC)” means that the time is recorded as “Universal Time Coordinated”, sometimes thought of as Greenwich Mean Time or GMT. Since I’m in the Pacific time zone, and daylight savings time is in effect, that means I sent it at roughly 2:33 PM PDT.
Content-Type: is how the mailers tell each other what the format of the mail is: plain text, as this example is, or HTML, or something else.
Mime-Version: “Mime” stands for Multipurpose Internet Mail Extensions, and is the formatting protocol most often used to encode attachments and alternate representations in a single email.
Date: This is the more common place you’ll find the date and time that the message was sent. This is added by the sending mailer, and is commonly used by your email client as the “Sent Date”. Note that the time zone is specified as local time (2:33 PM) and an offset (-7 hours) from UTC. PDT is 7 hours behind UTC as I write this. Subtract the offset (and remember that subtracting a negative offset means to add it), and you’ll get the equivalent 21:33 UTC.
Subject: As you’d expect, the subject of the email as you typed it.
Bcc: To be honest, I’m not sure why Hotmail includes this here, as they strip out any BCC’d recipients. BCC is
supposed to be stripped from email completely before it is sent.
To: Again, as you’d expect, the list of recipient email addresses that this message is addressed to. What most people don’t realize is that the To: line doesn’t define who the email actually goes to, but rather simply lists who the mailer claims it’s to go to. A virus, for example, can easily create a mail m.
Web 2.0: Making Email a Useful Web AppAndy Denmark
I gave this talk at Web 2.0 Expo in San Francisco on April 23, 2008. The presentation covers historical uses of email in applications as well as some of the new and innovative ways that companies such as TripIt are integrating email in to their applications. The presentation also goes over some of the practical concerns and implementation issues you will likely encounter while building an email based web application.
Information Systems Security3Information Systems Secur.docxjaggernaoma
Information Systems Security3
Information Systems Security
1.Collect the e-mails and view the e-mail header information in your e-mail program.
Spam refers to unsolicited email you don’t want. The most obvious examples of spam are unsolicited commercial emails, such as ads for porn, drugs, or body enhancement products.
There are two significant qualifications to spam:
You didn’t ask for it. An email that offers college degrees or cheaper mortgages from a person or a business that you’ve never communicated with would probably qualify as spam.
You don’t want it. When you receive it, you’re likely to delete it unread based on the subject line.
Spam is tricky. Some email programs and services automatically filter spam based on common key words, the number of people the message is being sent to, or the sender’s reputation. Some also allow you to flag messages as spam.
Unfortunately, any email that people don’t want runs the risk of being marked as spam. If an email newsletter that you signed up for changes its focus into something you don’t want, it might legitimately be considered spam.
X-Original Arrival Time: is the time the message was submitted to Hotmail … in other words, the time I pressed “Send”. Headers that begin with “X-” are “nonstandard”, and may not be used by all mailers. They’re often just informational. Note also the date and time: 13 May 2005 21:33:53.0097 (UTC). The “(UTC)” means that the time is recorded as “Universal Time Coordinated”, sometimes thought of as Greenwich Mean Time or GMT. Since I’m in the Pacific time zone, and daylight savings time is in effect, that means I sent it at roughly 2:33 PM PDT.
Content-Type: is how the mailers tell each other what the format of the mail is: plain text, as this example is, or HTML, or something else.
Mime-Version: “Mime” stands for Multipurpose Internet Mail Extensions, and is the formatting protocol most often used to encode attachments and alternate representations in a single email.
Date: This is the more common place you’ll find the date and time that the message was sent. This is added by the sending mailer, and is commonly used by your email client as the “Sent Date”. Note that the time zone is specified as local time (2:33 PM) and an offset (-7 hours) from UTC. PDT is 7 hours behind UTC as I write this. Subtract the offset (and remember that subtracting a negative offset means to add it), and you’ll get the equivalent 21:33 UTC.
Subject: As you’d expect, the subject of the email as you typed it.
Bcc: To be honest, I’m not sure why Hotmail includes this here, as they strip out any BCC’d recipients. BCC is
supposed to be stripped from email completely before it is sent.
To: Again, as you’d expect, the list of recipient email addresses that this message is addressed to. What most people don’t realize is that the To: line doesn’t define who the email actually goes to, but rather simply lists who the mailer claims it’s to go to. A virus, for example, can easily create a mail m.
Web 2.0: Making Email a Useful Web AppAndy Denmark
I gave this talk at Web 2.0 Expo in San Francisco on April 23, 2008. The presentation covers historical uses of email in applications as well as some of the new and innovative ways that companies such as TripIt are integrating email in to their applications. The presentation also goes over some of the practical concerns and implementation issues you will likely encounter while building an email based web application.
Electronic mail, often abbreviated as email, e.mail or e-mail, is a method of exchanging digital messages. E-mail systems are based on a store-and-forward model in which e-mail computer server systems accept, forward, deliver and store messages on behalf of users, who only need to connect to the e-mail infrastructure, typically an e-mail server, with a network-enabled device (e.g., a personal computer) for the duration of message submission or retrieval. Originally, e-mail was always transmitted directly from one user's device to another's; nowadays this is rarely the case.ThesisScientist.com
Presentation given at Ajax World in San Jose on October 20, 2008. Presents an overview of Internet applications passed and present and gives a hint of what is to come.
This file describes the contents and structure of the files in the HeirList Knowledge Base folder, and how they can be used. www.heirlist.com tinyurl.com/yutt4fc7
Rethinking how your organisation collaboratesStephen Bounds
A presentation given by Stephen Bounds at the Ark Group seminar "Strategic Email Management" in 2007.
Still a useful introduction on how to change the emphasis away from email and towards more appropriate communication methods such as linking files rather than using attachments, blogs, wikis and archived mailing lists.
Laboratorio Master BI&BDA (Modulo Web Data Analytics) : Reddit fashion insightsCarla Marini
Laoratorio svolto al Master in Business Intelligence & Big Dat Analytic, nel modulo Web Data Analytics
Analisi degli argomenti che trattano temi relativi alla moda in Reddit. Data Scraping, Data Cleaning, Data Clustering, Text Mining and Sentiment Analysis.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Electronic mail, often abbreviated as email, e.mail or e-mail, is a method of exchanging digital messages. E-mail systems are based on a store-and-forward model in which e-mail computer server systems accept, forward, deliver and store messages on behalf of users, who only need to connect to the e-mail infrastructure, typically an e-mail server, with a network-enabled device (e.g., a personal computer) for the duration of message submission or retrieval. Originally, e-mail was always transmitted directly from one user's device to another's; nowadays this is rarely the case.ThesisScientist.com
Presentation given at Ajax World in San Jose on October 20, 2008. Presents an overview of Internet applications passed and present and gives a hint of what is to come.
This file describes the contents and structure of the files in the HeirList Knowledge Base folder, and how they can be used. www.heirlist.com tinyurl.com/yutt4fc7
Rethinking how your organisation collaboratesStephen Bounds
A presentation given by Stephen Bounds at the Ark Group seminar "Strategic Email Management" in 2007.
Still a useful introduction on how to change the emphasis away from email and towards more appropriate communication methods such as linking files rather than using attachments, blogs, wikis and archived mailing lists.
Laboratorio Master BI&BDA (Modulo Web Data Analytics) : Reddit fashion insightsCarla Marini
Laoratorio svolto al Master in Business Intelligence & Big Dat Analytic, nel modulo Web Data Analytics
Analisi degli argomenti che trattano temi relativi alla moda in Reddit. Data Scraping, Data Cleaning, Data Clustering, Text Mining and Sentiment Analysis.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Mining Email Social Networks
1. Mining Email Social
Networks
Christian Bird, Alex Gourley,
Prem Devanbu, Michael Gertz, Anand Swaminathan
University of California, Davis
Presented By:
Arnamoy Bhattacharyya
3. Communication & Co-ordination (C&C) activities are central to large software
projects
Difficult to observe and study in traditional (closed-source, commercial)
settings
4. Communication & Co-ordination (C&C) activities are central to large software
projects
Difficult to observe and study in traditional (closed-source, commercial)
settings
the email archives of OSS projects provide a useful trace of the
communication and co-ordination activities of the participants
6. CHATTERERS & CHANGERS
A mailing list in an OSS project is a public forum
Anyone can post messages to the list.
7. CHATTERERS & CHANGERS
A mailing list in an OSS project is a public forum
Anyone can post messages to the list.
Posted messages are visible to all the mailing list
subscribers.
8. CHATTERERS & CHANGERS
A mailing list in an OSS project is a public forum
Anyone can post messages to the list.
Posted messages are visible to all the mailing list
subscribers.
Posters include developers, bug-reporters, contributors (who submit
patches, but don't have commit privileges) and ordinary
users.
9. A response b to a message a is an indication That –
the sender of b; (Sb) found that the sender of a; (Sa) had something
interesting to say
10. A response b to a message a is an indication That –
the sender of b; (Sb) found that the sender of a; (Sa) had something
interesting to say
It is also an indication of Sa’s status, i.e., Sb indicates that s/he
found Sa's email worth reading, and worthy of response.
11. A response b to a message a is an indication That –
the sender of b; (Sb) found that the sender of a; (Sa) had something
interesting to say
It is also an indication of Sa’s status, i.e., Sb indicates that s/he
found Sa's email worth reading, and worthy of response.
However, the vast majority of individuals participating on the email
list sent very few messages, and received very few replies to their
messages
12. OF DOGS AND DEVELOPERS
“On the Internet, no one knows if you're a Dog“ - Peter Steiner
13. OF DOGS AND DEVELOPERS
“On the Internet, no one knows if you're a Dog"
The same individual
can use different email aliases
14. OF DOGS AND DEVELOPERS
“On the Internet, no one knows if you're a Dog"
The same individual
can use different email aliases
developer Ian Holsman uses 7 different email
aliases
15. OF DOGS AND DEVELOPERS
“On the Internet, no one knows if you're a Dog"
The same individual
can use different email aliases
developer Ian Holsman uses 7 different email
aliases
Ignoring these aliases would confound later
steps of data analysis
16. Unmasking Aliases
Most emails include a header that identifies the sender, of this form:
From: "Bill Stoddard" <reddrum@attglobal.net>
17. Unmasking Aliases
Most emails include a header that identifies the sender, of this form:
From: "Bill Stoddard" <reddrum@attglobal.net>
Crawl messages and extract all
headers to produce a list of
<Name,email> identifiers (IDs)
Execute a clustering algorithm
that measure the similarity
between every pair of IDs
Manually Post Process the
clusters formed to remove
further aliases
18. Unmasking Aliases
Most emails include a header that identifies the sender, of this form:
From: "Bill Stoddard" <reddrum@attglobal.net>
Crawl messages and extract all
headers to produce a list of
<Name,email> identifiers (IDs)
Execute a clustering algorithm
that measure the similarity
between every pair of IDs
Manually Post Process the
clusters formed to remove
further aliases
set the cluster similarity threshold quite low:
easier to split big clusters than to unify two disparate clusters from a very
large set.
19. THE CLUSTERING ALGORITHM
1. Normalize name
remove all punctuation, suffixes
(“jr")
turn all whitespace into a single space
Remove generic terms like “admin", “support", from the name
split the name into first name and last name (using whitespace
and commas as cues)
20. THE CLUSTERING ALGORITHM
2. Name Similarity:
Use a scoring algorithm between –
The full names
The first name and last name separately
Consider names similar if the full names are similar, or
if both first and last names are similar
e.G Andy Smith <-> Andrew Smith
Deepa Patel !<-> Deepa Ratnaswamy
21. THE CLUSTERING ALGORITHM
3. Names-email Similarity:
If the email contains both first and last names – match
Arnamoy Bhattacharyya <-> ar.bhat@yahoo.com
if the email contains the initial of one part of the name and entirety
of the other part – match
Erin Bird <-> ebird
Erin Bird <-> erinb
22. THE CLUSTERING ALGORITHM
4. Email Similarity:
If the Levenshtein edit distance between two email address bases (not
including the domain, after the "@") is small – Match
23. THE CLUSTERING ALGORITHM
5. Cumulative ID similarity:
The similarity between two IDs is the maximum of the all mentioned
above
E.G
Name Similarity – 3
Names-email similarity – 5
Email Similarity – 2
If the threshold is 4, it would be considered as a match
24.
25. vast majority of people send only one message, and
there are some who send a great many
26.
27. Out-degree - # of different people from whom an individual has
received responses
Higher out-degree <-> higher status
28. In-degree - # of different people to whom an individual has
replied-to
Indicates the level of engagement of an
individual in the mailing list and the breadth of
his/her interests
29. In-degree - # of different people to whom an individual has
replied-to
Indicates the level of engagement of an
individual in the mailing list and the breadth of
his/her interests
The distributions show a small-world character
31. Correlation may not be true-
1. People who only post relevant messages get large responds to
messages
2. Only people who receive replies from several people keep sending
messages (Survival Effect)
33. C&C ACTIVITY AND DEVELOPMENT
ACTIVITY
How does email activity relate to software development activity?
73 committers-
1. A correlation of 0.80 between the number of messages sent by an
individual, and number of source changes they make –
more software development work <-> more C&C activity
34. C&C ACTIVITY AND DEVELOPMENT
ACTIVITY
How does email activity relate to software development activity?
73 committers-
1. A correlation of 0.80 between the number of messages sent by an
individual, and number of source changes they make –
more software development work <-> more C&C activity
2. A correlation of 0.57 between the number
of messages sent by an individual, and number
of document changes they make
source code activities require much more co-
ordination effort
than documentation effort
35. Are developers more likely to play the role of gatekeepers or brokers in the
complete email social network?
36. Are developers more likely to play the role of gatekeepers or brokers in the
complete email social network?
Betweenness (BW)---
37. Are developers more likely to play the role of gatekeepers or brokers in the
complete email social network?
Betweenness (BW)---
High betweenness <-> that the person is a kind of broker, or gatekeeper
40. Relative Status of Developers
Do the most active developers have the highest status among developers ?
41. Relative Status of Developers
Do the most active developers have the highest status among developers ?
Source changes are not as highly correlated with document changes <-> not all
developers are engaged in both to the same degree
42. Relative Status of Developers
Do the most active developers have the highest status among developers ?
Source changes are not as highly correlated with document changes <-> not all
developers are engaged in both to the same degree
Source changes shows the strongest rank correlation with the social network
status <-> the most active developers play the strongest role of
communicators, brokers, and gatekeepers
43. Conclusion
The level of activity on the mailing list is strongly correlated with source code
change activity, and to a lesser extent with document change activity.
44. Conclusion
The level of activity on the mailing list is strongly correlated with source code
change activity, and to a lesser extent with document change activity.
Social network measures such as in-degree, out-degree and betweenness
indicate that developers who actually commit changes, play much more
significant roles in the email community than non-developers.
45. Conclusion
The level of activity on the mailing list is strongly correlated with source code
change activity, and to a lesser extent with document change activity.
Social network measures such as in-degree, out-degree and betweenness
indicate that developers who actually commit changes, play much more
significant roles in the email community than non-developers.
Even within the select group of developers, there is a strong correlation
between the social network importance and level of source code change activity.