Sarah Guido gave a presentation on analyzing data with Python. She discussed several Python tools for preprocessing, analysis, and visualization including Pandas for data wrangling, scikit-learn for machine learning, NLTK for natural language processing, MRjob for processing large datasets in parallel, and ggplot for visualization. For each tool, she provided examples and use cases. She emphasized that the best tools depend on the type of data and analysis needs.
The document discusses computational social science and three common approaches: macroscopes, virtual labs, and empirical modeling. It provides examples of each approach, including using large-scale Facebook and Twitter data to study language patterns and personality (macroscope), manipulating Facebook data to study emotional contagion and voter turnout (virtual lab), and using Twitter data to predict county-level health outcomes (empirical modeling). Overall, the document outlines how new sources of big data allow computational social science to address challenges of studying social phenomena at large scales.
This document provides an overview of Python for data analysis using the pandas library. It discusses key pandas concepts like Series and DataFrames for working with one-dimensional and multi-dimensional labeled data structures. It also covers common data analysis tasks in pandas such as data loading, aggregation, grouping, pivoting, filtering, handling time series data, and plotting.
Sarah Guido gave a presentation on analyzing data with Python. She discussed several Python tools for preprocessing, analysis, and visualization including Pandas for data wrangling, scikit-learn for machine learning, NLTK for natural language processing, MRjob for processing large datasets in parallel, and ggplot for visualization. For each tool, she provided examples and use cases. She emphasized that the best tools depend on the type of data and analysis needs.
The document discusses computational social science and three common approaches: macroscopes, virtual labs, and empirical modeling. It provides examples of each approach, including using large-scale Facebook and Twitter data to study language patterns and personality (macroscope), manipulating Facebook data to study emotional contagion and voter turnout (virtual lab), and using Twitter data to predict county-level health outcomes (empirical modeling). Overall, the document outlines how new sources of big data allow computational social science to address challenges of studying social phenomena at large scales.
This document provides an overview of Python for data analysis using the pandas library. It discusses key pandas concepts like Series and DataFrames for working with one-dimensional and multi-dimensional labeled data structures. It also covers common data analysis tasks in pandas such as data loading, aggregation, grouping, pivoting, filtering, handling time series data, and plotting.
This document summarizes a presentation on data science in education given by Dr. Liu Ming-chi from National Cheng Kung University. The presentation outlines include topics such as data science, data scientists, the PISA assessment, future skills needed in education, how AI could impact education, and cases studies. It also includes several links to videos and articles about topics relating to using data in education.
Data Science and Machine Learning Using Python and Scikit-learnAsim Jalis
Workshop at DataEngConf 2016, on April 7-8 2016, at Galvanize, 44 Tehama Street, San Francisco, CA.
Demo and labs for workshop are at https://github.com/asimjalis/data-science-workshop
This document provides an overview and introduction to analyzing spatial data using Python. It discusses what spatial data is, popular Python libraries for working with spatial data like Fiona, Shapely, GeoPy, and Mapnik, and how to perform spatial analysis tasks in Python such as geocoding, data conversion and visualization. Jupyter notebooks are presented as an interactive environment for exploring spatial data and libraries like Geopandas and PySAL are introduced for performing spatial analysis. Examples analyze Colombian location and point of interest data.
The document discusses Python and its suitability for data science. It describes Python's Zen-like approach of focusing on simplicity and empowering users. It promotes Python's data science stack, including NumPy, Pandas, scikit-learn and others, and how they allow for rapid data analysis and model building. It also describes the Anaconda distribution and conda package manager for easily managing Python environments and packages.
Introduction to Machine Learning with Python and scikit-learnMatt Hagy
PyATL talk about machine learning. Provides both an intro to machine learning and how to do it with Python. Includes simple examples with code and results.
pandas: Powerful data analysis tools for PythonWes McKinney
Wes McKinney introduced pandas, a Python data analysis library built on NumPy. Pandas provides data structures and tools for cleaning, manipulating, and working with relational and time-series data. Key features include DataFrame for 2D data, hierarchical indexing, merging and joining data, and grouping and aggregating data. Pandas is used heavily in financial applications and has over 1500 unit tests, ensuring stability and reliability. Future goals include better time series handling and integration with other Python data science packages.
在這個資料科學蔚為風潮的年代,身為一個對新技術充滿好奇的攻城獅,自然會想要擴充自己的武器庫,學習嶄新的資料分析工具;而 R 語言,一個由統計學家專門為了資料探索與分析所開發的腳本語言,具有龐大的開源社群支持以及琳瑯滿目、數以萬計的各式套件,正是當今學習資料科學相關工具的首選。
然而,R 語言的設計邏輯與一般的程式語言不同,工程師們過去學習程式語言的經驗,往往造成學習 R 語言的障礙,本課程將從 R 語言的基礎開始,讓同學們從課堂講解以及互動式上機課程中,得以徹底理解 R 語言的核心概念與精要,學習如何利用 R 語言問資料問題,並且從資料分析的角度撰寫效率良好同時具有高度可讀性的 R 語言代碼。
在這資料科學逐漸成為顯學的年代,無論面對的是資料的幾個 V,其中最重要的永遠都是 Value (價值) 這個 V,而資料探勘正是一種透過系統化的方式釐清資料的脈絡、找出其中有價值的特徵與相關性的技術。這門六小時的課程,將從最實務的角度切入,與大家分享如何將現實中極待解決的問題,轉換成可以利用資料探勘技術處理的問題,並且運用 R 語言中各種強大的工具,進行關聯性分析、迴歸分析以及叢聚分析,以達成將資料中隱藏的資訊挖掘出來的最終目標。
Pandas are black and white bear-like animals that live in the misty, rainy forests of southwestern China. They weigh around 15 kg and eat bamboo shoots and leaves for 12 hours per day. Mother pandas give birth to tiny pink newborn babies that need milk and fur develops later. Their main threat is humans.
2016 Digital predictions for marketing, tech, pop culture and everything in b...Soap Creative
Another light-hearted look at what we think the zeitgeist of 2016 will be for marketing, tech, pop culture and everything in-between.
Many of our previous predictions are still in play and while we like to be right we'd rather make you smile with these less predictable trends.
Follow us for more updates.
Ever see great presentations on this site and wonder "How can I make slides like those?"
This quick, insight-packed course will distill many of the major lessons I've learned designing presentations (20 or so of which have been featured on the Slideshare homepage for clients like Honigman Media and Group 8A) over the past half decade.
The major areas of discussion include
STORYTELLING | RHETORIC | DESIGN
Each of these are rigorously examined using easy to understand examples and practical, actionable takeaways.
Click through these slides and come out the other side a better presentation designer, guaranteed!
I currently teach Digital Marketing at General Assembly and have given this lecture to nearly unanimous positive feedback.
If you'd like to get access to this PDF or pick my brain about presentation design, marketing, etc... shoot me a line!
EMAIL: Jig813@gmail.com
TWITTER: twitter.com/JoeandTell
LINKEDIN: linkedin.com/in/josephgelman
To help the curious class stay relevant, we’ve assembled an A-Z glossary of what we predict to be the 100 must-know terms and concepts for 2017.
We hope this cultural crib sheet will help prepare you for the year ahead.
Enjoy!
This document summarizes a presentation on data science in education given by Dr. Liu Ming-chi from National Cheng Kung University. The presentation outlines include topics such as data science, data scientists, the PISA assessment, future skills needed in education, how AI could impact education, and cases studies. It also includes several links to videos and articles about topics relating to using data in education.
Data Science and Machine Learning Using Python and Scikit-learnAsim Jalis
Workshop at DataEngConf 2016, on April 7-8 2016, at Galvanize, 44 Tehama Street, San Francisco, CA.
Demo and labs for workshop are at https://github.com/asimjalis/data-science-workshop
This document provides an overview and introduction to analyzing spatial data using Python. It discusses what spatial data is, popular Python libraries for working with spatial data like Fiona, Shapely, GeoPy, and Mapnik, and how to perform spatial analysis tasks in Python such as geocoding, data conversion and visualization. Jupyter notebooks are presented as an interactive environment for exploring spatial data and libraries like Geopandas and PySAL are introduced for performing spatial analysis. Examples analyze Colombian location and point of interest data.
The document discusses Python and its suitability for data science. It describes Python's Zen-like approach of focusing on simplicity and empowering users. It promotes Python's data science stack, including NumPy, Pandas, scikit-learn and others, and how they allow for rapid data analysis and model building. It also describes the Anaconda distribution and conda package manager for easily managing Python environments and packages.
Introduction to Machine Learning with Python and scikit-learnMatt Hagy
PyATL talk about machine learning. Provides both an intro to machine learning and how to do it with Python. Includes simple examples with code and results.
pandas: Powerful data analysis tools for PythonWes McKinney
Wes McKinney introduced pandas, a Python data analysis library built on NumPy. Pandas provides data structures and tools for cleaning, manipulating, and working with relational and time-series data. Key features include DataFrame for 2D data, hierarchical indexing, merging and joining data, and grouping and aggregating data. Pandas is used heavily in financial applications and has over 1500 unit tests, ensuring stability and reliability. Future goals include better time series handling and integration with other Python data science packages.
在這個資料科學蔚為風潮的年代,身為一個對新技術充滿好奇的攻城獅,自然會想要擴充自己的武器庫,學習嶄新的資料分析工具;而 R 語言,一個由統計學家專門為了資料探索與分析所開發的腳本語言,具有龐大的開源社群支持以及琳瑯滿目、數以萬計的各式套件,正是當今學習資料科學相關工具的首選。
然而,R 語言的設計邏輯與一般的程式語言不同,工程師們過去學習程式語言的經驗,往往造成學習 R 語言的障礙,本課程將從 R 語言的基礎開始,讓同學們從課堂講解以及互動式上機課程中,得以徹底理解 R 語言的核心概念與精要,學習如何利用 R 語言問資料問題,並且從資料分析的角度撰寫效率良好同時具有高度可讀性的 R 語言代碼。
在這資料科學逐漸成為顯學的年代,無論面對的是資料的幾個 V,其中最重要的永遠都是 Value (價值) 這個 V,而資料探勘正是一種透過系統化的方式釐清資料的脈絡、找出其中有價值的特徵與相關性的技術。這門六小時的課程,將從最實務的角度切入,與大家分享如何將現實中極待解決的問題,轉換成可以利用資料探勘技術處理的問題,並且運用 R 語言中各種強大的工具,進行關聯性分析、迴歸分析以及叢聚分析,以達成將資料中隱藏的資訊挖掘出來的最終目標。
Pandas are black and white bear-like animals that live in the misty, rainy forests of southwestern China. They weigh around 15 kg and eat bamboo shoots and leaves for 12 hours per day. Mother pandas give birth to tiny pink newborn babies that need milk and fur develops later. Their main threat is humans.
2016 Digital predictions for marketing, tech, pop culture and everything in b...Soap Creative
Another light-hearted look at what we think the zeitgeist of 2016 will be for marketing, tech, pop culture and everything in-between.
Many of our previous predictions are still in play and while we like to be right we'd rather make you smile with these less predictable trends.
Follow us for more updates.
Ever see great presentations on this site and wonder "How can I make slides like those?"
This quick, insight-packed course will distill many of the major lessons I've learned designing presentations (20 or so of which have been featured on the Slideshare homepage for clients like Honigman Media and Group 8A) over the past half decade.
The major areas of discussion include
STORYTELLING | RHETORIC | DESIGN
Each of these are rigorously examined using easy to understand examples and practical, actionable takeaways.
Click through these slides and come out the other side a better presentation designer, guaranteed!
I currently teach Digital Marketing at General Assembly and have given this lecture to nearly unanimous positive feedback.
If you'd like to get access to this PDF or pick my brain about presentation design, marketing, etc... shoot me a line!
EMAIL: Jig813@gmail.com
TWITTER: twitter.com/JoeandTell
LINKEDIN: linkedin.com/in/josephgelman
To help the curious class stay relevant, we’ve assembled an A-Z glossary of what we predict to be the 100 must-know terms and concepts for 2017.
We hope this cultural crib sheet will help prepare you for the year ahead.
Enjoy!
8. List vs Series
8
import numpy as np
import pandas as pd
sample_series = pd.Series(np.random.sample(1000000))
sample_list = list(np.random.sample(1000000))
%timeit sample_series+sample_series
1000 loops, best of 3: 1.04 ms per loop
10 loops, best of 3: 136 ms per loop
%timeit [i+i for i in sample_list]
1.04
136