This document discusses using the open-source statistical software R to analyze security-related information from Twitter. It provides instructions on installing relevant R packages for accessing the Twitter API and searching for tweets containing specific keywords or hashtags. As an example, it searches for tweets with the hashtag "#exploit" and visualizes the results in a heatmap to show when during the day the tweets were posted.
Hacker News vs. Slashdot—Reputation Systems in Crowdsourced Technology NewsChristoph Matthies
Comparing the reputation systems of Slashdot (slashdot.org) and Hacker News (news.ycombinator.com), highligting details and presenting possible changes.
Christoph Matthies (@chrima0), Robert Lehmann (@rlehmann)
This document discusses analyzing Twitter data from the user @a_bicky using R. It extracts over 3,200 tweets from the user's timeline using the twitteR package. The tweets are transformed into a data frame with variables like text, date, and source. The data is then summarized using the reshape2 and ggplot2 packages to calculate metrics like average text length by day of week, month, and source. Frequency tables and heat maps are generated to explore patterns in the Twitter data over time.
The document discusses using R and the twitteR package to connect to and interact with Twitter. It provides instructions for installing and configuring the twitteR and ROAuth packages to enable authentication and communication with the Twitter API. It also shows some basic examples of using twitteR to post tweets and direct messages.
This document discusses the role of data scientists in analyzing large and complex datasets to help answer critical questions. It notes that over 95% of digital data is unstructured and organizations lose millions annually due to inefficient use of information. Data scientists can help transform this data into usable knowledge by developing expertise in both data management and specific domains. They work with infrastructure experts and domain experts to analyze "big data" and solve grand challenges across many fields.
This document provides an overview of the company Teradata including that it has over 1,510 customers in various industries, offers key products and services to turn customer data into a competitive advantage, and has achieved market leadership and recognition while also facing competition from other companies in the data analytics and warehousing space.
Big data refers to the massive amounts of unstructured data that are growing exponentially. Hadoop is an open-source framework that allows processing and storing large data sets across clusters of commodity hardware. It provides reliability and scalability through its distributed file system HDFS and MapReduce programming model. The Hadoop ecosystem includes components like Hive, Pig, HBase, Flume, Oozie, and Mahout that provide SQL-like queries, data flows, NoSQL capabilities, data ingestion, workflows, and machine learning. Microsoft integrates Hadoop with its BI and analytics tools to enable insights from diverse data sources.
This presentation, by big data guru Bernard Marr, outlines in simple terms what Big Data is and how it is used today. It covers the 5 V's of Big Data as well as a number of high value use cases.
Hacker News vs. Slashdot—Reputation Systems in Crowdsourced Technology NewsChristoph Matthies
Comparing the reputation systems of Slashdot (slashdot.org) and Hacker News (news.ycombinator.com), highligting details and presenting possible changes.
Christoph Matthies (@chrima0), Robert Lehmann (@rlehmann)
This document discusses analyzing Twitter data from the user @a_bicky using R. It extracts over 3,200 tweets from the user's timeline using the twitteR package. The tweets are transformed into a data frame with variables like text, date, and source. The data is then summarized using the reshape2 and ggplot2 packages to calculate metrics like average text length by day of week, month, and source. Frequency tables and heat maps are generated to explore patterns in the Twitter data over time.
The document discusses using R and the twitteR package to connect to and interact with Twitter. It provides instructions for installing and configuring the twitteR and ROAuth packages to enable authentication and communication with the Twitter API. It also shows some basic examples of using twitteR to post tweets and direct messages.
This document discusses the role of data scientists in analyzing large and complex datasets to help answer critical questions. It notes that over 95% of digital data is unstructured and organizations lose millions annually due to inefficient use of information. Data scientists can help transform this data into usable knowledge by developing expertise in both data management and specific domains. They work with infrastructure experts and domain experts to analyze "big data" and solve grand challenges across many fields.
This document provides an overview of the company Teradata including that it has over 1,510 customers in various industries, offers key products and services to turn customer data into a competitive advantage, and has achieved market leadership and recognition while also facing competition from other companies in the data analytics and warehousing space.
Big data refers to the massive amounts of unstructured data that are growing exponentially. Hadoop is an open-source framework that allows processing and storing large data sets across clusters of commodity hardware. It provides reliability and scalability through its distributed file system HDFS and MapReduce programming model. The Hadoop ecosystem includes components like Hive, Pig, HBase, Flume, Oozie, and Mahout that provide SQL-like queries, data flows, NoSQL capabilities, data ingestion, workflows, and machine learning. Microsoft integrates Hadoop with its BI and analytics tools to enable insights from diverse data sources.
This presentation, by big data guru Bernard Marr, outlines in simple terms what Big Data is and how it is used today. It covers the 5 V's of Big Data as well as a number of high value use cases.
Jeff Stanton discusses discovery informatics for analyzing large datasets. He notes that the amount of data is growing rapidly but IT spending is not keeping pace. Traditional data exploration methods are insufficient for "big data." Emerging alternatives are needed for creating and analyzing large datasets. Stanton provides examples of big data in retail and jobs in the field. He also discusses costs of data reformatting and search failures. The document explores using crowdsourcing, natural language processing, and visualization tools to translate and validate psychological assessment items across languages and cultures.
Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University
This document discusses the statistical programming language R. It describes R as an open source platform for statistics, data management, and graphics. It notes that R comprises a core program plus thousands of add-in packages. It then compares R to other popular statistical software packages and notes that R is more popular and used by more analysts. Finally, it highlights some advantages of R, including its emphasis on reproducibility through coding data transformations.
This document provides instructions for installing and using R-Studio. It describes R-Studio as an integrated development environment for R with four panes - code, console, workspace, and file/plots. It outlines downloading and installing R-Studio after first installing R. It then demonstrates creating a simple MyMode function to calculate the mode, and improving it through multiple iterations to properly handle duplicate values and return the correct mode. The document encourages testing the function on sample data and trying to "break" it to find flaws.
A companion slide deck for this chapter:
Stanton, J. M. (2013). Data Mining: A Practical Introduction for Organizational Researchers. In Cortina, J. M., & Landis, R. S., Modern Research Methods for the Study of Behavior in Organizations. New York: Routledge Academic.
External pressures like changing demographics and increasing student debt have created challenges for universities. Effective strategic processes require clear priorities aligned with stakeholders' values. Strategy lies at the leadership core by balancing constituencies' conflicting demands. Strategic planning models include linear, adaptive, and interpretive approaches. The linear method scans environments and pursues objectives. Adaptive strategy continuously adapts through experimentation. Interpretive strategy aligns mission and goals through symbols. Universities that strategically communicated culture changes through symbols were more resilient during financial difficulties.
The document outlines the steps for developing a valid scale for use in web surveys, including defining the construct, generating items, pilot testing, refining the scale, and validating it with other measures. Key aspects include using subject matter experts, evaluating items statistically and conceptually, demonstrating the scale's nomological network, and publishing validation evidence. The goal is to create a concise yet reliable and valid scale for measuring constructs online.
Carma internet research module getting started with question proSyracuse University
This document provides instructions for getting started with the QuestionPro survey platform. It outlines the sign up and survey creation process, including how to [1] create an account, [2] navigate to the home screen and click "Create Survey", [3] follow the wizard to name and design the survey, and [4] add and format questions and response options. It also notes that the free version is limited but universities can get site licenses, and it describes how to preview, distribute and share surveys with others for feedback.
The document discusses visual design considerations for survey design, including:
1) Goals of maximizing reward for participants while minimizing costs, reducing errors and biases.
2) Cognitive and visual processing steps respondents go through.
3) Ensuring clarity through proper use of formatting, fonts, colors, spacing and screen sizing for different devices.
4) Considering audiences that may have limited bandwidth.
This document discusses three topics related to cutting edge research that industrial-organizational psychologists could be doing more of. First, there is relevant scholarship published in conference proceedings that could also be published in journals. Second, alternative employment arrangements like temporary work and contracting are a fast growing area. Third, social media provides new opportunities for data collection that can complement surveys.
The document demonstrates how to analyze movie box office data using R. Key steps include:
1. Loading the data and checking its structure and variables.
2. Creating a histogram of the DAY_NUM variable to visualize its distribution.
3. Converting factors to numbers and aggregating the daily box office amounts by movie.
3. Creating a bar plot of the total box office amounts by movie to identify the highest-grossing films. Issues encountered during the process are also discussed.
This document compares two integrated development environments (IDEs) for the R programming language: R-Studio and Rcmdr. R-Studio is a more powerful and flexible IDE that provides direct access to R code and facilitates interactions with R through its graphical interface. Rcmdr is simpler and more user-friendly, focusing on statistical analysis through buttons and menus. Both allow viewing data, but neither support data editing. The document provides guidelines for choosing between them and notes additional R IDEs under development.
This document provides an introduction to advanced data analytics using R. It outlines the key steps in an analytics process: [1] understanding the domain; [2] obtaining and cleaning data; [3] reducing, transforming, and visualizing the data; [4] choosing analytical approaches; and [5] communicating results. As a first example, it analyzes a public dataset on ice cream consumption using R commands to summarize, visualize with histograms and boxplots, and explore relationships between variables like income, temperature, and consumption over time. The document demonstrates how to interpret these analyses and leverage additional tools in R to further understand the data.
This document discusses moving data between Excel and R. It explains that R maintains a current working directory to simplify reading and saving files. It also discusses using the clipboard to copy small chunks of data between Excel and R, including variable names. The best option is to put the clipboard data into a dataframe. Dataframes are explained as lists of vectors that can hold different data types. The document demonstrates importing and exporting CSV files between Excel and R and using the data interchange format to exchange files between the two programs. It suggests tasks for demonstrating mastery of these skills, such as importing/exporting CSV files and using dataframes.
This document provides an introduction to advanced data analytics. It discusses [1] how organizations lose millions annually due to inefficient use of data, [2] the sources and types of big data being generated, and [3] the multi-disciplinary nature of data analytics, drawing on fields like database technology, statistics, machine learning, and visualization. The key steps of analytics projects are outlined, including understanding the domain, preprocessing data, reducing and transforming it, selecting analytical approaches, communicating results, and deploying and evaluating new systems.
This document provides instructions for installing R and R-Studio, two programs for performing advanced data analytics. It explains that R can be downloaded from its website for Linux, MacOS, and Windows, while R-Studio can also be downloaded from its website for those operating systems. The document then demonstrates how to use R-Studio, which displays the workspace, console, and ability to show graphics and other information across multiple tabs. It includes example R commands to help orient users and demonstrates assigning a value to a variable to show mastery of the basics.
Fred Oswald and Jeff Stanton are experts in reducing response burden in surveys through various statistical and technological methods. Their goals are to reduce administration time and costs, increase response rates, and decrease fatigue while maintaining reliability. Their approaches include reducing instructions, removing redundant items, distributing items across subgroups, and using automation. Their research examines determining efficient item assignments, evaluating when precision is lost by reducing content, analyzing item relationships, and stakeholder reactions. Their expertise involves examining tools to inform practical survey development goals.
This document discusses nonresponse bias in surveys and methods for assessing its impact. It begins by explaining why low response rates can undermine survey validity and introduces techniques researchers have used to increase response rates over time. However, it argues that response rate alone is not a good indicator of bias; more important is understanding if and how nonrespondents differ from respondents. The document then presents the Nonresponse Bias Impact Assessment (N-BIAS) framework, which involves using multiple techniques like archival analysis, follow-ups, wave analysis, and others to evaluate nonresponse bias in a given study.
This document discusses several promising techniques for future data collection, including visual surveys, audio/video interviewing, virtual worlds, web scraping, social network mapping, embedded polls on sites like Facebook, and collecting data from mobile devices. It provides examples of platforms that can be used to implement each technique and highlights advantages like reduced costs and wider reach, as well as disadvantages like technical requirements.
This document discusses internet data collection methods and sampling techniques used for internet-based research. It covers topics like defining the universe and population, developing samples, probability and non-probability sampling, sources of internet samples, criticisms of internet samples, and preferred mixed mode sampling strategies. The document consists of lecture notes from a course on internet data collection methods, providing information on key concepts and considerations for sampling in internet research.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Jeff Stanton discusses discovery informatics for analyzing large datasets. He notes that the amount of data is growing rapidly but IT spending is not keeping pace. Traditional data exploration methods are insufficient for "big data." Emerging alternatives are needed for creating and analyzing large datasets. Stanton provides examples of big data in retail and jobs in the field. He also discusses costs of data reformatting and search failures. The document explores using crowdsourcing, natural language processing, and visualization tools to translate and validate psychological assessment items across languages and cultures.
Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University
This document discusses the statistical programming language R. It describes R as an open source platform for statistics, data management, and graphics. It notes that R comprises a core program plus thousands of add-in packages. It then compares R to other popular statistical software packages and notes that R is more popular and used by more analysts. Finally, it highlights some advantages of R, including its emphasis on reproducibility through coding data transformations.
This document provides instructions for installing and using R-Studio. It describes R-Studio as an integrated development environment for R with four panes - code, console, workspace, and file/plots. It outlines downloading and installing R-Studio after first installing R. It then demonstrates creating a simple MyMode function to calculate the mode, and improving it through multiple iterations to properly handle duplicate values and return the correct mode. The document encourages testing the function on sample data and trying to "break" it to find flaws.
A companion slide deck for this chapter:
Stanton, J. M. (2013). Data Mining: A Practical Introduction for Organizational Researchers. In Cortina, J. M., & Landis, R. S., Modern Research Methods for the Study of Behavior in Organizations. New York: Routledge Academic.
External pressures like changing demographics and increasing student debt have created challenges for universities. Effective strategic processes require clear priorities aligned with stakeholders' values. Strategy lies at the leadership core by balancing constituencies' conflicting demands. Strategic planning models include linear, adaptive, and interpretive approaches. The linear method scans environments and pursues objectives. Adaptive strategy continuously adapts through experimentation. Interpretive strategy aligns mission and goals through symbols. Universities that strategically communicated culture changes through symbols were more resilient during financial difficulties.
The document outlines the steps for developing a valid scale for use in web surveys, including defining the construct, generating items, pilot testing, refining the scale, and validating it with other measures. Key aspects include using subject matter experts, evaluating items statistically and conceptually, demonstrating the scale's nomological network, and publishing validation evidence. The goal is to create a concise yet reliable and valid scale for measuring constructs online.
Carma internet research module getting started with question proSyracuse University
This document provides instructions for getting started with the QuestionPro survey platform. It outlines the sign up and survey creation process, including how to [1] create an account, [2] navigate to the home screen and click "Create Survey", [3] follow the wizard to name and design the survey, and [4] add and format questions and response options. It also notes that the free version is limited but universities can get site licenses, and it describes how to preview, distribute and share surveys with others for feedback.
The document discusses visual design considerations for survey design, including:
1) Goals of maximizing reward for participants while minimizing costs, reducing errors and biases.
2) Cognitive and visual processing steps respondents go through.
3) Ensuring clarity through proper use of formatting, fonts, colors, spacing and screen sizing for different devices.
4) Considering audiences that may have limited bandwidth.
This document discusses three topics related to cutting edge research that industrial-organizational psychologists could be doing more of. First, there is relevant scholarship published in conference proceedings that could also be published in journals. Second, alternative employment arrangements like temporary work and contracting are a fast growing area. Third, social media provides new opportunities for data collection that can complement surveys.
The document demonstrates how to analyze movie box office data using R. Key steps include:
1. Loading the data and checking its structure and variables.
2. Creating a histogram of the DAY_NUM variable to visualize its distribution.
3. Converting factors to numbers and aggregating the daily box office amounts by movie.
3. Creating a bar plot of the total box office amounts by movie to identify the highest-grossing films. Issues encountered during the process are also discussed.
This document compares two integrated development environments (IDEs) for the R programming language: R-Studio and Rcmdr. R-Studio is a more powerful and flexible IDE that provides direct access to R code and facilitates interactions with R through its graphical interface. Rcmdr is simpler and more user-friendly, focusing on statistical analysis through buttons and menus. Both allow viewing data, but neither support data editing. The document provides guidelines for choosing between them and notes additional R IDEs under development.
This document provides an introduction to advanced data analytics using R. It outlines the key steps in an analytics process: [1] understanding the domain; [2] obtaining and cleaning data; [3] reducing, transforming, and visualizing the data; [4] choosing analytical approaches; and [5] communicating results. As a first example, it analyzes a public dataset on ice cream consumption using R commands to summarize, visualize with histograms and boxplots, and explore relationships between variables like income, temperature, and consumption over time. The document demonstrates how to interpret these analyses and leverage additional tools in R to further understand the data.
This document discusses moving data between Excel and R. It explains that R maintains a current working directory to simplify reading and saving files. It also discusses using the clipboard to copy small chunks of data between Excel and R, including variable names. The best option is to put the clipboard data into a dataframe. Dataframes are explained as lists of vectors that can hold different data types. The document demonstrates importing and exporting CSV files between Excel and R and using the data interchange format to exchange files between the two programs. It suggests tasks for demonstrating mastery of these skills, such as importing/exporting CSV files and using dataframes.
This document provides an introduction to advanced data analytics. It discusses [1] how organizations lose millions annually due to inefficient use of data, [2] the sources and types of big data being generated, and [3] the multi-disciplinary nature of data analytics, drawing on fields like database technology, statistics, machine learning, and visualization. The key steps of analytics projects are outlined, including understanding the domain, preprocessing data, reducing and transforming it, selecting analytical approaches, communicating results, and deploying and evaluating new systems.
This document provides instructions for installing R and R-Studio, two programs for performing advanced data analytics. It explains that R can be downloaded from its website for Linux, MacOS, and Windows, while R-Studio can also be downloaded from its website for those operating systems. The document then demonstrates how to use R-Studio, which displays the workspace, console, and ability to show graphics and other information across multiple tabs. It includes example R commands to help orient users and demonstrates assigning a value to a variable to show mastery of the basics.
Fred Oswald and Jeff Stanton are experts in reducing response burden in surveys through various statistical and technological methods. Their goals are to reduce administration time and costs, increase response rates, and decrease fatigue while maintaining reliability. Their approaches include reducing instructions, removing redundant items, distributing items across subgroups, and using automation. Their research examines determining efficient item assignments, evaluating when precision is lost by reducing content, analyzing item relationships, and stakeholder reactions. Their expertise involves examining tools to inform practical survey development goals.
This document discusses nonresponse bias in surveys and methods for assessing its impact. It begins by explaining why low response rates can undermine survey validity and introduces techniques researchers have used to increase response rates over time. However, it argues that response rate alone is not a good indicator of bias; more important is understanding if and how nonrespondents differ from respondents. The document then presents the Nonresponse Bias Impact Assessment (N-BIAS) framework, which involves using multiple techniques like archival analysis, follow-ups, wave analysis, and others to evaluate nonresponse bias in a given study.
This document discusses several promising techniques for future data collection, including visual surveys, audio/video interviewing, virtual worlds, web scraping, social network mapping, embedded polls on sites like Facebook, and collecting data from mobile devices. It provides examples of platforms that can be used to implement each technique and highlights advantages like reduced costs and wider reach, as well as disadvantages like technical requirements.
This document discusses internet data collection methods and sampling techniques used for internet-based research. It covers topics like defining the universe and population, developing samples, probability and non-probability sampling, sources of internet samples, criticisms of internet samples, and preferred mixed mode sampling strategies. The document consists of lecture notes from a course on internet data collection methods, providing information on key concepts and considerations for sampling in internet research.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Physiology and chemistry of skin and pigmentation, hairs, scalp, lips and nail, Cleansing cream, Lotions, Face powders, Face packs, Lipsticks, Bath products, soaps and baby product,
Preparation and standardization of the following : Tonic, Bleaches, Dentifrices and Mouth washes & Tooth Pastes, Cosmetics for Nails.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
1. Mining Tweets for
Security Information
with “R”
Jeff Stanton, School of Information Studies
Syracuse University
2. @highfours: I just
watched a plane
crash into the
hudson rive in
manhattan
@ReallyVirtual:
Helicopter
hovering above
Abbotabad at 1AM
(is a rare event).
Twitter: Early Warning System?
4. 140 characters max
@petridishes – Screen Name
#blackberry – User-created hashtag
@crozzledhearts – “Retweeter” who sent this
tweet after receiving it from @petridishes
30 minutes ago via web – Each tweet
encoded with UTC timecode
No URLs here, but they are auto-shortened
Anatomy of a Tweet
6. A GNU open source project
An implementation of the “S” statistical
language developed at Bell labs
Largely an interpreted, command-line
interface with some GUI add-ons
More than 4300 add-on packages developed
by the user community
Full-featured data management and matrix
manipulation with performance comparable
to Octave and MATLAB
Extensive graphics for visualization
Starting in 2010, used by more data miners
(43%) than any other single tool
“R” Facts
7. Developed by Jeff
Gentry (Fidelity)
Five classes and 11
functions to:
◦ Authenticate to Twitter
with Oauth and check
current rate limit
◦ Manipulate, send, and
receive direct messages
◦ Update user status
◦ Search for tweets
containing particular
keywords or hashtags
◦ Examine topic trends
◦ Examine timelines
The “twitteR” package
8. Use the R “Packages” menu to install the
necessary packages:
bitops, RJSONIO, RCurl, and twitteR
Depending upon Mac/Win/Linux, you may need
to retrieve a zipped file of RCurl from:
◦ http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contri
b/2.14/
Then ready the packages for use in R with the
library() command:
> library(bitops)
> library(RCurl)
> library(RJSONIO)
> library(twitteR)
Getting Ready – Load Packages
9. > expTweets <- searchTwitter('#exploit', n=500)
> expDF <-
do.call("rbind", lapply(expTweets, as.data.fr
ame))
The second command above takes the raw tweet data in
expTweets – which starts as a list/collection of separate
data objects (frames) – and binds it into a single data
frame for ease of analysis
lapply() applies a command to each element of a list
as.data.frame is a type coercion
rbind is the function that joins separate objects to become
rows in a dataframe
do.call() repeats the rbind over all elements of the list
Search Twitter for “#exploit”
10. > head(expDF,1)
text: RT @hacktalkblog: New Exploit [webapps] - Wordpress Age
Verification Plugin http://t.co/O8wVjKca #Exploit
favorited: FALSE
replyToSN: NA
created: 2012-01-10 18:19:11
truncated: FALSE
replyToSID: <NA>
id: 156802281747124224
replyToUID: NA
statusSource: <a href="http://twitterfeed.com"
rel="nofollow">twitterfeed</a>
screenName: NotaThreat2u
A Preview of the Data
11. > head(expDF$created,1)
Histogram of expDF$created
[1] "2012-01-10 18:19:11
UTC“
20
The created variable is
conveniently coded as a
15
POSIX time variable
Frequency
calibrated to UTC
10
>
hist(expDF$created, breaks=15,
5
freq=TRUE)
0
Shows a frequency
histogram (with about 15 13:50 18:00 22:10 02:20 06:30 10:40
break points) expDF$created
Nice spike at 18:00 UTC
(about 1pm EST)
Visualizing the Data: When Tweeted?
11
12. # Total time between 1st and last tweet
elapsedTime = max(expDF$created) - min(expDF$created)
timeBin = floor(elapsedTime/11) # Make 11 bins
# Add a new variable with the bin designators
expDF$slice = floor((expDF$created -
min(expDF$created))/(as.integer(timeBin)*3600))
expSlices<-expDF[,c("screenName","slice")] # subset the data
expTable<-table(expSlices) # Count tweets in each slice
# Convert table data to matrix that heatmap() expects
expMatrix<-matrix(expTable,ncol=length(colnames(expTable)))
rownames(expMatrix)<-rownames(expTable)
colnames(expMatrix)<-paste('Slice',1:12)
heatmap(expMatrix,Rowv=NA,Colv=NA,
col=rainbow(max(expMatrix)+1,start=0.5,end=.7))
Prepare a Heatmap
14. library(stringr) # Provides easy string
functions
str_match(expDF$text, "^RT @") # Find RT @ at
beginning of each line
Regular expression matching any number of
alphanumeric characters or underscore:
[[:alnum:]_]*
str_match(expDF$text, "^RT @[[:alnum:]_]*") #
Matches the whole retweet screen name
expDF$rtSN = str_match(expDF$text, "^RT
@[[:alnum:]_]*") # Adds a new variable
Do Some Parsing with Regex
17. 0
5
10
15
20
25
30
#security
rt
alert
exploit
injection
sql
cross
scripting
site
#ccureit
new
remote
vulnerability
#cyber
#cyberwar
#hacker
buffer
cms
file
and
disclosure
of
1.4
execution
vulnerabilities
wordpress
/
[webapps]
analysis
multiple
overflow
1.3.3
Most common keywords
advanced
code
command
en
for
information
phpmydirectory
with
17
18. #security – Another good hashtag to search on
(SQL) Injection – Apparently one of the most common
attacks
cross (site) scripting – Another popular attack
#cyber #cyberwar #ccureit #hacker – More
hashtags?
remote vulnerability, buffer (overflow), cms,
wordpress, phpmydirectory
Each/any of these keywords could provide a basis for a
new tweet search term, or for keyword detection within a
set of tweets obtained from another search, or for an alert
dashboard with periodic updates
Common keywords to explore
18
19. @shitaesy Je me couche à 20h30 en ce
moment.. J'ai même lu ce soir :3 #exploit
Scanning across a sample of the
tweets, some are spam and should be
filtered out
Can we create a classifier that will get rid
of the non-exploit tweets?
Must Remove the Non-Tweets
20. • Attributes
Initial model developed Attribute can be
with training data
1 boolean or
numeric
• Most useful if
Attribute independent
2 of other
attributes
• The fewer
Model accuracy checked Attribute the
on training data, but later
3 attributes
cross-validated on new data
the better
Anatomy of a Classifier
21. write.table(expDF, sep=",",
file="exploitData.csv")
# I looked at the tweets and added
# training data, using my judgment to code
# the non-tweets
truExp = read.table("truExp.txt")
# Add to the existing data
expDFtrue = cbind(expDF, truExp)
# Note: new variable name defaults to “V1”
Create and Add Training Data
22. # Create true/false values for each row,
# based on whether the string exists
expDFtrue$hassec =
grepl("security",tolower(expDFtrue$text))
# Also count some punctuation to see if
# there are clues there
expDFtrue$numhash =
sapply(strsplit(as.character(expDFtrue$te
xt),"#"),length)-1
Easy Predictors with Grepl()
23. Coefficients (output from Logit analysis:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.946e+00 1.904e+00 1.547 0.12187
hassecTRUE 2.302e+00 1.186e+00 1.941 0.05222 .
hassqlTRUE 1.770e+01 4.076e+03 0.004 0.99653
hasbufTRUE -2.476e+00 1.124e+00 -2.203 0.02757 *
hasscrTRUE 1.832e+01 4.135e+03 0.004 0.99647
hasremTRUE -3.075e-01 1.109e+00 -0.277 0.78164
hascybTRUE 1.202e+00 1.958e+00 0.614 0.53937
numhash -2.046e+00 7.218e-01 -2.835 0.00458 **
numast -2.182e+01 5.554e+03 -0.004 0.99687
numdot 6.306e-01 4.167e-01 1.513 0.13017
twtlen 6.548e-03 2.329e-02 0.281 0.77854
# security, buffer keywords are promising, as well as the
# number of hash marks and the number of dots/periods
Choose Best Attributes
25. Conclusion 1: R is pretty handy for grabbing and manipulating
tweet data
Conclusion 2: Tweet data are messy and require a good deal of
clean-up, parsing, and filtering
Conclusion 3: As these two examples suggest, tweets can provide
breaking news about vulnerabilities and exploits
◦ WordPress Age Verification plugin versions 0.4 and below open redirect
vulnerability
Exploit availability tweeted at 12:19 PM
Blogged at SecurityBlog 10:24 PM
Added to SiloBreaker two days later
◦ Pragyan CMS v 3.0 Remote File Disclosure
Exploit availability tweeted at 11:07 AM
Appeared on PacketStorm next day
On RealHacker three days later
On WebCriminal.ru eight days later
Twitter: Early Warning System?
This graph shows the Michael Jackson Effect, with a strong uptick in the half hour following the announcement of his death on Hollywood celebrity site TMZ.com (Doctors at UCLA Hospital had announced the death 18 minutes before that). Twitter crashed temporarily under the load at 3:15 PM. Twitter has about 200 million users and is the 9th most popular site on the web. On a typical day Twitter handles 200 million tweets and 1.6 billion search queries.
Could also make a fancier Poststript plot with:post(fit, file = "tree.ps", title = "Classification Tree for Exploit Tweets")