Week 2 of the Data Science course covers discussions on big data and statistical sampling, required reading from introductory data science and machine learning texts, and an assignment to visually describe the Haberman dataset in R. The roadmap provides guidance on discussions, recommended reading, optional activities, and instructions for submitting the visual assignment by the deadline. It also previews upcoming weeks focusing on machine learning, data visualization, and data ethics.
UNIT 1: INTRODUCTION
Introduction to Web - Limitations of current Web – Development of Semantic Web – Emergence of the Social Web – Statistical Properties of Social Networks -Network analysis - Development of Social Network Analysis - Key concepts and measures in network analysis - Discussion networks -Blogs and online communities - Web-based networks
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Denis Parra Santander
- First version was a guest lecture about Network Visualization in the class "Data Visualization" taught by Dr. Sharon Hsiao in the QMSS program at Columbia University http://www.columbia.edu/~ih2240/dataviz/index.htm
- This updated version was delivered in our class on SNA at PUC Chile in the MPGI master program.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
Social Network Analysis Introduction including Data Structure Graph overview. Given in Cincinnati August 18th 2015 as part of the DataSeed Meetup group.
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
The advent of the social networks has completely changed our daily life. The deluge of data collected on Social Network Services (SNS) and recent developments in complex network theory have enabled many marvelous predictive analysis, which tells us many amazing stories.
Why do we often feel that "the world is so small?" Is the six-degree separation purely imagination or based on mathematical insights? Why are there just a few rockstars who enjoy extreme popularity while most of us stay unknown to the world? When science meets coffee shop knowledge, things are bound to be intriguing.
I will first briefly describe what social networks are, in the mathematical sense. Then I will introduce some ways to extract characteristics of networks, and how these analyses can explain many anecdotes in our life. Finally, I'll show an example of what we can learn from social network analysis, based on data from Groupon.
An overview of the Network Overview Discovery and Exploration add-in for Excel 2007 (NodeXL), a social network analysis add-in for the familiar spreadsheet application. Visualize twitter, flickr, facebook, and email networks with just a few mouse clicks.
2009 Node XL Overview: Social Network Analysis in Excel 2007Marc Smith
A quick overview of the features of NodeXL, the network overview, discovery, and exploration add-in for Excel 2007. This tool allows for visualizing directed graphs and social networks within Excel. It provides several network metrics and manipulation tools. Networks can be imported from Twitter and personal email.
UNIT 1: INTRODUCTION
Introduction to Web - Limitations of current Web – Development of Semantic Web – Emergence of the Social Web – Statistical Properties of Social Networks -Network analysis - Development of Social Network Analysis - Key concepts and measures in network analysis - Discussion networks -Blogs and online communities - Web-based networks
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Denis Parra Santander
- First version was a guest lecture about Network Visualization in the class "Data Visualization" taught by Dr. Sharon Hsiao in the QMSS program at Columbia University http://www.columbia.edu/~ih2240/dataviz/index.htm
- This updated version was delivered in our class on SNA at PUC Chile in the MPGI master program.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
Social Network Analysis Introduction including Data Structure Graph overview. Given in Cincinnati August 18th 2015 as part of the DataSeed Meetup group.
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
The advent of the social networks has completely changed our daily life. The deluge of data collected on Social Network Services (SNS) and recent developments in complex network theory have enabled many marvelous predictive analysis, which tells us many amazing stories.
Why do we often feel that "the world is so small?" Is the six-degree separation purely imagination or based on mathematical insights? Why are there just a few rockstars who enjoy extreme popularity while most of us stay unknown to the world? When science meets coffee shop knowledge, things are bound to be intriguing.
I will first briefly describe what social networks are, in the mathematical sense. Then I will introduce some ways to extract characteristics of networks, and how these analyses can explain many anecdotes in our life. Finally, I'll show an example of what we can learn from social network analysis, based on data from Groupon.
An overview of the Network Overview Discovery and Exploration add-in for Excel 2007 (NodeXL), a social network analysis add-in for the familiar spreadsheet application. Visualize twitter, flickr, facebook, and email networks with just a few mouse clicks.
2009 Node XL Overview: Social Network Analysis in Excel 2007Marc Smith
A quick overview of the features of NodeXL, the network overview, discovery, and exploration add-in for Excel 2007. This tool allows for visualizing directed graphs and social networks within Excel. It provides several network metrics and manipulation tools. Networks can be imported from Twitter and personal email.
How to conduct a social network analysis: A tool for empowering teams and wor...Jeromy Anglim
Slides and details available at: http://jeromyanglim.blogspot.com/2009/10/how-to-conduct-social-network-analysis.html
A talk on using social network analysis as a team development tool.
2013 NodeXL Social Media Network AnalysisMarc Smith
Social media network analysis and visualization with NodeXL - the network overview discovery and exploration add-in for Excel. Map Twitter, Facebook, email, blogs, and the web with a point and click interface within the familiar spreadsheet.
This material caters the descriptions and different techniques of Internet-based researches. This material also caters helpful web sites and search engines.
Understanding Continuous Design in F/OSS ProjectsBetsey Merkel
By authors Les Gasser1,2
gasser@uiuc.edu
Gabriel Ripoche1, 3
gripoche@uiuc.edu
Walt Scacchi2
wscacchi@ics.uci.edu
Bryan Penne1
bpenne@uiuc.edu
Abstract
Open Source Software (OSS) is in regular widespread use supporting critical
applications and infrastructure, including the Internet and World Wide Web themselves. The communities of OSS users and developers are often interwoven. The deep engagement of users and developers, coupled with the openness of systems lead to community-based system design and re-design activities that are continuous. Continuous redesign is facilitated by communication and knowledge-
sharing infrastructures such as persistent chat rooms, newsgroups, issue-
reporting/tracking repositories, sharable design representations and many kinds of
"software informalisms." These tools are arenas for managing the extensive, varied,
multimedia community knowledge that forms the foundation and the substance of
system requirements. Active community-based design processes and knowledge repositories create new ways of learning about, representing, and defining systems that challenge current models of representation and design. This paper presents several aspects of our research into continuous, open, community-based design
practices. We discuss several new insights into how communities represent
knowledge and capture requirements that derive from our qualitative empirical
studies of large (ca. 2GB+) repositories of problem-report data, primarily from the
Mozilla project.
Creating a Learning Technology Roadmap: Maximizing Efficiency While Boosting ...Cognizant
A centralized, learner-centric architecture -- based on a strategically-driven technology roadmap -- encompasses the functions, processes, methodologies, systems and tools necessary to provide knowledge when and where needed.
european open science cloud (EOSC). visions and impact on DARIAH roadmapeveline wandl-vogt
lightning talk @ open science retreat @ NIKHEF, science park campus, amsterdam (22.2.2016); european open science cloud visions from DARIAH point of view.
When you're starting or running a company, how do you choose technology? The prevailing advice du jour is something along the lines of "use the best tool for the job." This is obviously right, but it is also devoid of meaning in an unfortunate way that lets people define "best" and "job" as myopically as they like.
Whether it's for your company or your own professional development (or ideally both), everyone should have a technology roadmap. Unfortunately there is no easy path to pre-made wisdom here, but this talk opines on some ideas and approaches to help formulate a roadmap that is relevant, pragmatic and importantly, able to be communicated to others.
Presented at Mastering SAP Technologies 2016
How to conduct a social network analysis: A tool for empowering teams and wor...Jeromy Anglim
Slides and details available at: http://jeromyanglim.blogspot.com/2009/10/how-to-conduct-social-network-analysis.html
A talk on using social network analysis as a team development tool.
2013 NodeXL Social Media Network AnalysisMarc Smith
Social media network analysis and visualization with NodeXL - the network overview discovery and exploration add-in for Excel. Map Twitter, Facebook, email, blogs, and the web with a point and click interface within the familiar spreadsheet.
This material caters the descriptions and different techniques of Internet-based researches. This material also caters helpful web sites and search engines.
Understanding Continuous Design in F/OSS ProjectsBetsey Merkel
By authors Les Gasser1,2
gasser@uiuc.edu
Gabriel Ripoche1, 3
gripoche@uiuc.edu
Walt Scacchi2
wscacchi@ics.uci.edu
Bryan Penne1
bpenne@uiuc.edu
Abstract
Open Source Software (OSS) is in regular widespread use supporting critical
applications and infrastructure, including the Internet and World Wide Web themselves. The communities of OSS users and developers are often interwoven. The deep engagement of users and developers, coupled with the openness of systems lead to community-based system design and re-design activities that are continuous. Continuous redesign is facilitated by communication and knowledge-
sharing infrastructures such as persistent chat rooms, newsgroups, issue-
reporting/tracking repositories, sharable design representations and many kinds of
"software informalisms." These tools are arenas for managing the extensive, varied,
multimedia community knowledge that forms the foundation and the substance of
system requirements. Active community-based design processes and knowledge repositories create new ways of learning about, representing, and defining systems that challenge current models of representation and design. This paper presents several aspects of our research into continuous, open, community-based design
practices. We discuss several new insights into how communities represent
knowledge and capture requirements that derive from our qualitative empirical
studies of large (ca. 2GB+) repositories of problem-report data, primarily from the
Mozilla project.
Creating a Learning Technology Roadmap: Maximizing Efficiency While Boosting ...Cognizant
A centralized, learner-centric architecture -- based on a strategically-driven technology roadmap -- encompasses the functions, processes, methodologies, systems and tools necessary to provide knowledge when and where needed.
european open science cloud (EOSC). visions and impact on DARIAH roadmapeveline wandl-vogt
lightning talk @ open science retreat @ NIKHEF, science park campus, amsterdam (22.2.2016); european open science cloud visions from DARIAH point of view.
When you're starting or running a company, how do you choose technology? The prevailing advice du jour is something along the lines of "use the best tool for the job." This is obviously right, but it is also devoid of meaning in an unfortunate way that lets people define "best" and "job" as myopically as they like.
Whether it's for your company or your own professional development (or ideally both), everyone should have a technology roadmap. Unfortunately there is no easy path to pre-made wisdom here, but this talk opines on some ideas and approaches to help formulate a roadmap that is relevant, pragmatic and importantly, able to be communicated to others.
Presented at Mastering SAP Technologies 2016
Assignment 3 Presenting With PowerPointJane R. Doe .docxrock73
Assignment 3: Presenting With PowerPoint
Jane R. Doe 9/3/17
CIS105 Intro to Information Systems
1
Hello, my name is Jane Doe and today I will be presenting my Assignment 3: Presenting with PowerPoint presentation, all about everything I learned in CIS105 Intro to Information Systems in my first term here at Strayer University.
1
Information Systems Terms and Concepts
Digital Literacy
Knowledge, skills, and behaviors needed for the effective use of digital devices and effective participation in an information-based society
Hardware
The physical components of a computer that you can touch
Software
The coded instructions that tell the computer’s hardware what to do
2
We started the course by learning some basic information systems terms and concepts. This was important to get us started learning all about the world of computers. I work in the security field so I didn’t have very much experience with computers at all coming into this course, other than knowing how to download an app on my phone.
One thing I learned about was digital literacy—this means the knowledge, skills, and behaviors needed for the effective use of digital devices and effective participation in an information-based society. Being digitally literate is an essential skill for being an online student, working in today’s information-based job market, and just being a citizen of the modern world and being able to do what you need to do as far as taking care of your family, paying your bills, ordering things online, and everyday actions like that.
I also learned the difference between your computer’s hardware and software. The hardware is the physical parts of a computer that you can see and touch, like the keyboard, the monitor, the tower, the mouse, the webcam. The software is the programs that run it – the coded instructions that tell the computer’s hardware what to do. The Microsoft Office Suite that we learn later in the class is an example of computer software.
2
Internet
Internet Service Provider (ISP)
A company that provides Internet access for a fee
Protocol
Standard set of rules, requirements, and criteria for all devices and networks to follow
Search Engine
Software system that relies on algorithms to process data and search for content on the Web
3
NOTE: In this sample only the first two (2) slides contain sample audio, remember to add audio to each slide.
Next in Weeks 2 and 3 we learned all about the Internet—what it is, what it does, and how to best use it most effectively and efficiently for your needs.
One of the main things I learned here was that an Internet Service Provider, or ISP, is a company that provides Internet access for a fee. Service Electric and Verizon in my home area of Pennsylvania are our main providers.
I learned that protocol refers to the standard set of rules, requirements, and criteria that all devices (computers, iPads, phones, etc.) and networks follow. This allows the devices and networks t ...
COM 106 help Making Decisions/Snaptutorialpinck2324
For more classes visit
www.snaptutorial.com
Choose two of the following scenarios. For each scenario you choose, answer each component as clearly and completely as possible.
You are writing up your weekly responses for your COM106 course and want to respond to a classmate who discusses the need to use social media in a job search. You are not familiar with this phrase.
Describe what sort of search you would conduct to help you in responding to this student and why. What search engine(s) or database(s) might you use and why? What search terms? How would you go about evaluating the credibility of the information you found?
ITS835 enterprise risk managementChapter 3ERM at Mars, Incor.docxvrickens
ITS835 enterprise risk management
Chapter 3
ERM at Mars, Incorporated: ERM for Strategy and Operations
Introduction
Mars’ ERM history
Phase 1 –Crash and Burn
Phase 2 -Success
Global rollout
Reporting
Operating workshops
Technology
Aggregation
Template evolution
Conclusion
Mars’ erm history
Mars, Incorporated
Privately held -> migration to non-family management
Decentralized management
Leadership had legacy commitment to risk management
ERM was viewed as an evolution
COSO versus bespoke approach
COSO –Committee of Sponsoring Organizations structure
Bespoke approach won
Phase 1
Failed due to being impractical and overly complex
Phase 2
Simpler and targeted
Planning workshops
Desire to align senior management goals with ERM
Started with simple template
Operating plan initiative sheet
Objective
Score
Risk column
Risk treatment column
Management team met to define and rank
Risks
Risk treatments
Changed label from “mitigations”
Global rollout
Used lessons learned from pilot
Each unit has specific nuances
Interviewing GM and CFO together saved subsequent interview time
Workshops helped to identify
Gaps in risk management readiness
High-risk initiatives
Ongoing activities with unexpected high risk
reporting
Color-coding adds
Urgency
Clarity
Groups are defined
Clusters
Score represents
Confidence of meeting goals
Reporting [cont’d]
Reporting [cont’d]
Reporting [cont’d]
Operating workshops
Several ongoing changes
Technology
Early-on, process was technology agnostic
Word -> Excel
Excel -> purpose-built software
ERM supports aggregation
More complete view of organizational impact of risk
Continual template evolution
Added risk treatment owners and due dates
summary
Mars received an award for their ERM
Corporate Executive Boards’ “Force of Ideas Award” for ERM
Key factors for ERM success
Alignment with Mars’ principles
Focus on meeting objectives
Operational
Strategic
Flexible
Realistic
WEEK 2 ASSIGNMENTS
BUS 3022: 600-800 WORDS
Dell's Supply Chain
Many companies have achieved excellent success as a result of emphasizing online sales and an associated distribution network. Dell has been one of the most successful at that, by offering significant customization direct to end users. As noted in a prior edition of our textbook:
“Dell is able to exploit most of the responsiveness-enhancing opportunities offered by the Internet for customized servers. The company uses the Internet to offer a wide variety of customized server configurations with the desired chassis, processor, memory, and operating system. Customization allows Dell to satisfy customers by giving them a product that is close to their specific requirements. The customization options are easy to display over the Internet, allowing Dell to attract customers who value this choice.”
“The Internet allows companies such as Dell and Apple to bring new products to market quickly. This is particularly important in the compute ...
AdvanceStorage.zip
yyy.docx
MOVIE VIEWS SYSTEM
Proposal, Technical Project
by
REEMA ALZAKI
This proposal is submitted to
Professor/DrFarnoushBanaei-kashani
Advanced Data Stores, 5800-002
September 16, 2015
1. Introduction
Movies are important as they present new experiences, cultures, places, ideas, and so forth. They are similar to the “campfire” that people used to gather around to tell and hear stories in the past. They are a way to step out of our world and into another world for a brief time, thus allowing us to relieve stress, bond with our families and loved ones, and so forth.
The drawback to this type of entertainment is that there are so many different types of movies being produced so quickly by Hollywood, it can be difficult to choose one to watch. In addition, parents also find it difficult choosing the right movie for their children. This is because even though the Motion Picture Association of America (MPAA) rates movies on violence, language and drug use, these ratings are often confusing. Furthermore, with lifestyles being so busy, no one has time to sit and look through all of the reviews of the new movies that are coming out and choose the best one to see.
2. OBJECTIVE
The objective of this project is to have a system which assists customers in choosing and borrowingmovies which they will return back within a certain timeframe. The movies will be divided based on genre, MPAA ratings, as well as additional information, such as current reviews listed from 1-3 stars. The system will offer three package subscriptions-Basic, Deluxe, and Supreme. The Supreme package will offer service without advertisements. It will also offer these services in different platforms: PCs and video consoles like WiiU, Xbox, orPlaystation.
3. Benefits
Benefits of providing this service are that customers will be given five movie choices from which to pick that best suits their needs at the time. They will not have to waste money buying movies or going to movie theaters. They will not have to waste time looking through movie reviews and they will be given movie choices that are appropriate for their children’s ages. As the younger generations become older, average spending for these consumers is expected to rise with a potential increase in their wages, thereby the overall market will demonstrate an exponential increase in growth in the near future. Additional plans to include smartphone and tablet moviesfor the upcoming generations are in the works as well, though not for this project at this time.
4. PROPOSED TECHNICAL APPROACH
We intend to build a new generation data management system based on NoSQL’sBigTable, as it was designed for an organization that works with a variety of data (structured, semi-structured, and unstructured) that is ever increasing in volume and needs to be stored, processed, and analyzed. It can also be compressed. We believe that the use of a data storage system of this type is more effi.
Information Extraction from Text, presented @ DeloitteDeep Kayal
Useful unstructured text occurs in plentiful amounts, and often is central to the success of a business. The benefits of being able to successfully decipher unstructured text can be direct or derived. Companies which offer products for medical differential diagnosis are directly benefitted by the ability to correctly extract drug-disease interactions from publications, for example. As for derived benefits of text processing, we need to look no further than cases of improving process flows by analyzing the sentiment of the emails a company receives from its customers.
Being at the frontier of natural language processing, information representation and retrieval, information extraction has been the subject of extensive research for several decades and there are plenty of existing techniques to help with the understanding of unstructured textual content. This presentation will introduce and summarize useful techniques that are helpful in tackling sub-domains of information extraction, such as named entity recognition, keyword extraction and document summarization for efficient retrieval. Additionally, the talk will also emphasize low-resource cases, when not much useful labelled information is available.
Dr. Sam Musa01-01-2017Network LAN Design with VoIP and Wireles.docxkanepbyrne80830
Dr. Sam Musa 01-01-2017
Network LAN Design with VoIP and Wireless Services
This section will provide a detailed LAN-design of network with VoIP services, Wireless services, protocols, devices, and interconnectivity, with WAN.
This section includes but not limited to
· Equipment List
· Hierarchical IP scheme and VLAN
· Link IP addresses
· High Level Diagram
· Voice and Wireless Design
Equipment List:
Select all networking hardware
Suggested Template
Device
Cisco Model#
Quantity
Comments
Distribution Switches
Cat 3850 Series Swiches
2
48 port model, wireless capacity/PoE
Branch office router
RV220W Wireless
2
4 port switch, built in firewall
Access Switches
Cat 2600
10
48 ports
IDS
Network Monitoring System
Fire wall
Cisco Unified Call Manger
2
Voice services
Step 2:
Now name each device as per naming convention (Since it is new company, make it for them)
Suggested template
Device
Device Configured Name
Placement
Connection
Comments
Cat 3850 Series switch
Data center Switch 1
Data center
Distribution Switch
VLAN1, 2, 8
The switch has Inventory server, Payroll server, and VLAN 1, 2, 8
Core Switch 7600 Series
CoreRouter1
Room#1014
DSW1 and DSW2
ISP1
Hierarchical IP scheme and VLAN
Create an IP Scheme and VLANS.
I suggest use the table below to create your hierarchical IP addressing scheme.
Location
Number of
IP Addresses
Required
Future
Growth
Rounded
Power of 2
Number of
Host Bits
Subnet Address Assigned
Floor1
1500
200
2048
11
172.20.0.0-172.20.7.254/21
Floor2
200
100
500
9
172.20.8.0-172.20.9.254/23
Floor 3
45
20
128
7
172.20.16.0-172.20.16.122/25
Floor4
Create VLAN: In creating VLAN, I will suggest use organizational structure model for simplicity.
Examples: VPOPRVLAN1, VPOPRVLAN2
High Level Diagram:
Drawing a network topology diagram is the most challenging task. To overcome this challenge, we need to use Cisco modular technology in upgrading the network in other words top down design approach. The top design approach starts with Application, Devices and infrastructure. You will also use the same approach in designing WWTC network. Select all the applications for the network. Then select the devices needed to run these applications. Now you are ready creating network topology diagram. Since, in WWTC network we have one floor, so all of our devices, application and infrastructure will reside in one floor.
Create subnets. Generally subnet matches organizational structure. Also, in a large network to increase performance or for security reasons, subnets are created. Furthermore to accommodate the need of a department, the subnets can be subnetted further or VLANs are created or both. Every organization have subnets and VLANs. Let us say we need 20 VLANs, which will serve client’s requirements, performance and security of the network. Assign these VLANs to switches. For example; you need 3 switches to host 3 VLANs for VPOPR. The diagram below depict the scenario.
Sample Network Diagram
.
1
IDS 403 Final Project Part Two Guidelines and Rubric
Overview
This course explores technology and its impact on the world around us. Technology influences society, and society influences technology, creating a feedback
loop between them. We will critically analyze this feedback loop in this course through social, historical, and theoretical approaches to technology as well as the
four general education lenses: history, humanities, natural and applied sciences, and social sciences. Each of these four perspectives allows us to better
understand the construction of technology and its interrelation with society. From this enhanced understanding, you will be equipped to draw connections
between technology, society, and your personal and professional lives, helping you to become a better-informed citizen who can make a positive difference in the
world.
Issues and events in technology have a pervading influence on many aspects of society, and how they are dealt with requires diverse knowledge and perspectives
to investigate and change. The purpose of this project is to examine a specific issue or event in technology and how it impacts individuals and society through the
development of a critical analysis portfolio and a presentation in which you will demonstrate your ability to think critically, investigate, and communicate clearly.
These skills are often necessary to achieve personal and professional goals across many disciplines.
In this assignment, you will demonstrate your mastery of the following course outcomes:
Analyze the evolving role of technology in one’s discipline of study or chosen profession by investigating the influence of technology on modern culture
[IDS-403-01]
Integrate interdisciplinary approaches for determining how technology affects modern identity in personal and professional contexts [IDS-403-02]
Explain how technology influences modern society by employing appropriate research strategies [IDS-403-03]
Recommend strategies for utilizing current technology to meet personal and professional goals [IDS-403-04]
Articulate informed viewpoints on how technology shapes the world and can influence change using effective communication skills [IDS-403-05]
Assess the impact of emerging technologies on societal issues for incorporating diverse perspectives and viewpoints informed by relevant literature and
interpersonal experiences [IDS-403-06]
Prompt
For the second part of this project, you will develop a multimedia presentation in which you will have a chance to reflect on what you have learned about your
issue or event, yourself, and society through analyzing its impact on technology. You will also be able to apply your communication skills and integrate multimedia
elements to communicate your message to an audience.
In developing this presentation, you will be able to use your analyses from the first part of this project as a starting point. The reflective nature of this ...
1 IDS 403 Final Project Part Two Guidelines and Rubric AbbyWhyte974
1
IDS 403 Final Project Part Two Guidelines and Rubric
Overview
This course explores technology and its impact on the world around us. Technology influences society, and society influences technology, creating a feedback
loop between them. We will critically analyze this feedback loop in this course through social, historical, and theoretical approaches to technology as well as the
four general education lenses: history, humanities, natural and applied sciences, and social sciences. Each of these four perspectives allows us to better
understand the construction of technology and its interrelation with society. From this enhanced understanding, you will be equipped to draw connections
between technology, society, and your personal and professional lives, helping you to become a better-informed citizen who can make a positive difference in the
world.
Issues and events in technology have a pervading influence on many aspects of society, and how they are dealt with requires diverse knowledge and perspectives
to investigate and change. The purpose of this project is to examine a specific issue or event in technology and how it impacts individuals and society through the
development of a critical analysis portfolio and a presentation in which you will demonstrate your ability to think critically, investigate, and communicate clearly.
These skills are often necessary to achieve personal and professional goals across many disciplines.
In this assignment, you will demonstrate your mastery of the following course outcomes:
Analyze the evolving role of technology in one’s discipline of study or chosen profession by investigating the influence of technology on modern culture
[IDS-403-01]
Integrate interdisciplinary approaches for determining how technology affects modern identity in personal and professional contexts [IDS-403-02]
Explain how technology influences modern society by employing appropriate research strategies [IDS-403-03]
Recommend strategies for utilizing current technology to meet personal and professional goals [IDS-403-04]
Articulate informed viewpoints on how technology shapes the world and can influence change using effective communication skills [IDS-403-05]
Assess the impact of emerging technologies on societal issues for incorporating diverse perspectives and viewpoints informed by relevant literature and
interpersonal experiences [IDS-403-06]
Prompt
For the second part of this project, you will develop a multimedia presentation in which you will have a chance to reflect on what you have learned about your
issue or event, yourself, and society through analyzing its impact on technology. You will also be able to apply your communication skills and integrate multimedia
elements to communicate your message to an audience.
In developing this presentation, you will be able to use your analyses from the first part of this project as a starting point. The reflective nature of this ...
1. Introduction and how to get into Data
2. Data Engineering and skills needed
3. Comparison of Data Analytics for statistic and real time streaming data
4. Bayesian Reasoning for Data
Mohan k. bavirisetty introduction to semantic soa & bpm sept 14 2010 v 1.0
Data scientist enablement dse 400 week 2 roadmap
1. Data Scientist Enablement
DSE 400 - Fast Track to Data Science
Week 2 Roadmap
Advanced Center of Excellence
Modern Renaissance Corporation
In Collaboration with SONO team and others
Content of this document is under Creative Commons Licence CC BY 4.0
2. Agenda
You can always find the latest version of this document at http://bit.ly/1dVHJwO
Week 1 Recap
Week 2 At a Glance
Discussions
Required Reading
Practice
Assignments and Submission
Looking ahead
References
Citation
Acknowledgement A strong will, a settled purpose, and an
invincible determination can accomplish almost
anything. - Thomas Fuller
3. During week 1 you were able to
Understand Data Science is and articulate what Data
Scientists do on day-to-day basis
Installed R and R-Studio
Explored UCI Machine Learning Repository
Import Housing Dataset into R
Explored SONO and participated in Discussions
DSE 400 - Week 1 Recap
4. Discussions:
Fuss about Big Data. Statistical sampling etc. Optional Q&A
Reading plan:
Read Chapters 4-7 from An Introduction to Data Science
R for Machine Learning by Allison Chung
Activities:
Play with spreadsheets, continue research on Data Viz. tools, connect with local groups etc.
Assignment 2:
Download Haberman dataset from UCI Machine Learning Repository into your R-Studio
environment and visually describe this dataset.
DSE 400 - Week 2 at a glance
5. Discussion 1: What’s all this fuss about Big Data? How would you go beyond
talking about 3 or 4 Vs of Big Data? Volume, Variety, Velocity, and Veracity
(by the way veracity means trustworthiness of this data). How about Value?
Do the people talk about it in the context of Big Data? Share your thoughts.
Discussion 2: “Statistics is defined as the discipline of using data samples to
support claims about populations.” Comments?
These discussions are required. These will be posted sequentially. If you have
access to SONO you are encouraged to participate in these discussions.
There will also be an Optional Q&A
For the sake of simplicity and ease of navigation, please do not create additional threads.
Social Engagement on SONO - Week 2
http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1002
6. SONO or SOKNO (Social Knowledge platform) is chosen for the DSE program to enable
Social Engagement, Collaboration as well as Knowledge Dissemination which are all
important to an Open initiative like this.
We understand that many of you may be initially having navigational issues. To ease things,
here are some tweaks SONO team and the DSE community are developing, as we speak.
To enter a Knowledge Cell, login first then use the full url to enter right KC. For week 2 you
would use the following link
http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1002
Weekly KCs DSE 400 Week 1, 2 ... etc. map to knocell numbers 1001, 1002 and so on on
these urls. Once you are in a KC click on threads to go to the current discussions. We
certainly appreciate your patience during this transitory phase.
SONO Tweaks
7. Read Chapters 4-7 from An Introduction to Data Science
Read R for Machine Learning by Allison Chung (Sections 1 to 3.4, pages 1-5 )
<Optional> Introduction to Probability and Statistics Using R (Chapters 1-3)
<Optional> Read Chapter 2-7 from Think Stats: Probability and Statistics for
Programmers
If you are unfamiliar with basic Statistical concepts or if you need a quick refresher
on this topic, please refer to Statistics Playlist by Khan Academy
Week 2 - Recommended Reading Plan
Data Scientists (n.): Person who is better at statistics than any software engineer and
better at software engineering than any statistician. - Josh Wills
8. <Practice> Given the following dataset, find manually the mean, median, mode, variance and standard
deviation for this population.
{ 3, 15, 17, 18, 20, 20, 12, 20, 20, 16, 17, 12, 4, 7, 15, 20, 12, 6, 1, 20 }
Also try using a spreadsheet (such as Excel or Google Spreadsheet) to find the above measures
for the same dataset.
<Practice> Math is Fun. Learn what Relative Frequency Distribution is. Try the example at the bottom
of the this page.
<Community Outreach> <Optional> Explore and connect with your local R Group (or Data
Scientist/Big Data groups) and check out their projects, talks and seminars that might interest you. Also
discuss with them how you can engage with them and help them out in their endeavors.
Activities
9. <Optional> If you are not fully happy with the statistical functionality of your familiar
spreadsheet package, download PSPP free statistical analysis tool from SourceForge and
play with it.
<Optional> <Advanced> Import Housing Data into R-Studio and describe it statistically.
You may need packages like pastecs which let you use stat.desc function.
<Optional> Register for Big Data in Motion This is a free online webinar scheduled for Jan
30, 2014, 1 PM EST. Attendance is optional but recommended.
Need more? Reach out to our Research Scholar Ms. Rachel Fleming
< Rachel@emodern.biz> and ask for more activities and challenges.
Activities - contd ...
10. Assignment 2 - Submission Required
Assignment: Download Haberman Survival dataset from UCI Machine Learning Repository. Import this
dataset into your R-Studio. Generate three graphic representations: Histogram, Scatter Plot and Box
Plot , as depicted above. Refer to R for Machine Learning by Allison Chung before you attempt this
assignment.
Image credit: R for Machine Learning by Allison Chung
<Help On Demand> You may
reach out to our Research
Scholar Ms. Rachel Fleming
<rachel@emodern.biz>
if you have any difficulties
with this assignment.
11. Submissions
Deadline Saturday, 11:59 PM your local time.
Mail Assignment 2 to <datascience400@gmail.com>
Submit a PDF document of the screenshots of your R-
Studio workspace showing the three visualizations
discussed. Use the naming convention: DSE 400 >
Assignment 2 > Your Full Name for your document.
No document links should be sent. Please add DSE 400
> Assignment 2 in the subject line.
12. Week 3 - 4 Intro to Machine Learning(ML) - Classification, Clustering, Prediction NaiveBayes,
Recommendations and Boosting algorithms . Refer to R for Machine Learning by Allison Chung
Watch Caltech Machine Learning Videos on Youtube
Week 5 Visualizations. Present your research Data Visualization Tools - A Comparative Study
Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc.
Week 8 Ethics, Privacy and Building Data Products.
DSE 400 - Weeks 3-8 ahead
13. References, Resources and Additional Reading
An Introduction to Data Science
Think Stats: Probability and Statistics for Programmers
Statistics Playlist by Khan Academy
R for Machine Learning by Allison Chung
Introduction to Hypothesis Testing
Single Sample Hypothesis Testing Part 1 and Part 2
R for Beginners by Emmanuel Paradis
R - Reference Cards
Introduction to R Playlist (Video Collection) on Youtube
Caltech Machine Learning Playlist on Youtube
[MIT OCW] Prediction: Machine Learning and Statistics from MIT Sloan School of
Management,
14. Citation
The dataset titled Haberman's Survival Data used here for Assignment 2 comes from UCI
Machine Learning Repository
Donor for Haberman's Survival Data: Tjen-Sien Lim (limt@stat.wisc.edu). It was added UCI
Machine Learning Repository on March 4, 1999
R for Machine Learning by Allison Chung is recommended by MIT Course Prediction:
Machine Learning and Statistics from Sloan School of Management, It is adopted in DSE
400 as per OCW guidelines.
Content that appears as is on this document only, is under Creative Commons License CC
BY 4.0 This license may not necessarily apply to other material referenced here in this
document.
15. For More Information
Presentation deck for DSE 400 > Week 1 Roadmap can be found at http://bit.ly/1hC5wAV
Week 2 discussions take place during this week on SONO DSE 400 Week 2
<Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming
<rachel@emodern.biz> if you have any difficulties with the assignments.
We welcome questions, thoughts and suggestions. Post these on SONO in the right
forum/discussion or write to us at <datascience400@gmail.com>
You can always find the latest version of this document at http://bit.ly/1dVHJwO
18. We thank our community of committed and passionate volunteers, experts,
educators, innovators, benefactors, advisers, advocates, mentors and
supporters
We are also grateful to the outstanding support and encouragement from
SONO team as well as other organizations like MIT Sloan of Management,
IBM, HortonWorks, R-Project, Creative Commons, Open Courseware
Consortium, Stanford University, Caltech, O’Reilly Publications and Data
Science Central etc.
Acknowledgement