A set of slides from my closing keynote at DamnData. I go over the concept of the Data Scientific Method, the skills required for a data scientists from hacking, to maths and stats, to expertise to business knowledge. I also talk about some ideas we worked on, some tools we use, some technologies and the most important part, the questioning of the data.
Presented an abridged version of my "What is data science" talk at #websummit 2013.
This talk goes over the required skillset as defined by Drew Conway and his famous venn diagram, and also outlines the Data Scientific Method brought by Dr. Patil. The talk is mainly two parts and the second part goes over some of the packages and technologies we use — minus the storage part.
Clare Corthell: Learning Data Science Onlinesfdatascience
Clare Corthell, Data Scientist and Designer at Mattermark, and author of the Open Source Data Science Masters, shares her experience teaching herself data science with online resources. http://datasciencemasters.org/
Claudia Gold: Learning Data Science Onlinesfdatascience
Claudia Gold, author of the Data Analysis Learning path on SlideRule, talks about why she wrote it and how to approach learning data science on your own. https://www.mysliderule.com/learning-paths/data-analysis/
Python has been the fastest growing language globally in the last 5 years. In India, the average salary for a Python developer is 4.8 lacs per annum at the entry level. This deck gets you started on fundamentals of Python, Python constructs and guides you to set up the environment and write your first Python program. It introduces you to the Spotle Certificate Masterclass In Python which covers the complete Python fundamentals through interactive videos, live classes, integrated hands-ons and projects.
Data Science is one of the hottest career options globally right now with data scientists earning an average of 15 lacs to 18 lacs annually. This deck explains the fundamentals of Data Science, the role of a Data Scientist.
The deck also introduces the Certificate Masterclass in Data Science with Python by Spotle Learn. This course is specifically designed by the experts for the people who want to build a career in data science. This course will equip you with the fundamental knowledge and practical expertise required for data science careers through a rigorous pedagogy based on videos, live projects, interactive classes and integrated internships.
Developing in R - the contextual Multi-Armed Bandit editionRobin van Emden
The document discusses R package development. It covers that R is dominant in statistics research and is an interpreted language. It also supports multiple programming paradigms like imperative, functional and object oriented programming. It discusses different class systems in R like S3, S4 and the newer R6 class. It emphasizes that R6 class provides a better approach. The document also highlights the importance of skills like semantic development skills, syntactic development skills and domain knowledge for R development.
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
The document discusses lessons learned from the author's personal journey in search engineering. It covers insights from library science about treating search as an information-seeking context and communicating with users. It also discusses the importance of entity detection and how to leverage corpus features to improve extraction. The author realized that queries vary in difficulty and systems need to recognize this and adapt accordingly. The key takeaway is that search should be treated as a communication problem rather than just a ranking task.
A set of slides from my closing keynote at DamnData. I go over the concept of the Data Scientific Method, the skills required for a data scientists from hacking, to maths and stats, to expertise to business knowledge. I also talk about some ideas we worked on, some tools we use, some technologies and the most important part, the questioning of the data.
Presented an abridged version of my "What is data science" talk at #websummit 2013.
This talk goes over the required skillset as defined by Drew Conway and his famous venn diagram, and also outlines the Data Scientific Method brought by Dr. Patil. The talk is mainly two parts and the second part goes over some of the packages and technologies we use — minus the storage part.
Clare Corthell: Learning Data Science Onlinesfdatascience
Clare Corthell, Data Scientist and Designer at Mattermark, and author of the Open Source Data Science Masters, shares her experience teaching herself data science with online resources. http://datasciencemasters.org/
Claudia Gold: Learning Data Science Onlinesfdatascience
Claudia Gold, author of the Data Analysis Learning path on SlideRule, talks about why she wrote it and how to approach learning data science on your own. https://www.mysliderule.com/learning-paths/data-analysis/
Python has been the fastest growing language globally in the last 5 years. In India, the average salary for a Python developer is 4.8 lacs per annum at the entry level. This deck gets you started on fundamentals of Python, Python constructs and guides you to set up the environment and write your first Python program. It introduces you to the Spotle Certificate Masterclass In Python which covers the complete Python fundamentals through interactive videos, live classes, integrated hands-ons and projects.
Data Science is one of the hottest career options globally right now with data scientists earning an average of 15 lacs to 18 lacs annually. This deck explains the fundamentals of Data Science, the role of a Data Scientist.
The deck also introduces the Certificate Masterclass in Data Science with Python by Spotle Learn. This course is specifically designed by the experts for the people who want to build a career in data science. This course will equip you with the fundamental knowledge and practical expertise required for data science careers through a rigorous pedagogy based on videos, live projects, interactive classes and integrated internships.
Developing in R - the contextual Multi-Armed Bandit editionRobin van Emden
The document discusses R package development. It covers that R is dominant in statistics research and is an interpreted language. It also supports multiple programming paradigms like imperative, functional and object oriented programming. It discusses different class systems in R like S3, S4 and the newer R6 class. It emphasizes that R6 class provides a better approach. The document also highlights the importance of skills like semantic development skills, syntactic development skills and domain knowledge for R development.
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
The document discusses lessons learned from the author's personal journey in search engineering. It covers insights from library science about treating search as an information-seeking context and communicating with users. It also discusses the importance of entity detection and how to leverage corpus features to improve extraction. The author realized that queries vary in difficulty and systems need to recognize this and adapt accordingly. The key takeaway is that search should be treated as a communication problem rather than just a ranking task.
This document provides an introduction and overview of resources for learning Python for data science. It introduces the presenter, Karlijn Willems, a data science journalist who has worked as a big data developer. It then lists several useful links for learning Python, statistics, machine learning, databases, and data science tools like Apache Spark. Finally, it recommends people to follow in data science and analytics fields.
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
This Edureka Data Science tutorial will help you understand in and out of Data Science with examples. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. Why Data Science?
2. What is Data Science?
3. Who is a Data Scientist?
4. How a Problem is Solved in Data Science?
5. Data Science Components
The talk is on How to become a data scientist. This was at 2ns Annual event of Pune Developer's Community. It focuses on Skill Set required to become data scientist. And also based on who you are what you can be.
This document outlines the steps of the research process. It begins by defining research and the process. The research process consists of 7 steps: 1) defining the research problem, 2) conducting a literature review, 3) generating hypotheses, 4) designing the research, 5) collecting data, 6) analyzing the data, and 7) reaching conclusions and recommendations and reporting the findings. Each step is then further explained in 1-2 sentences.
The document discusses open data science research topics presented at a conference, including opportunities and challenges with learning analytics and adaptive learning using open data. It describes how learning analytics can help achieve large improvements in student outcomes through targeted feedback and personalized learning paths. An open analytics architecture is proposed to integrate different data sources and applications using common data standards.
Siddhant Thakur is a data scientist with over 1 year of experience in machine learning, statistics, and programming projects focused on sports analytics and prediction modeling. His skills include Python, C/C++, SQL, Java, R, and machine learning algorithms. He has worked on projects predicting NFL game winners using random forest classification and clustering medical patients based on lab reports. Currently he is building models to predict the NCAA March Madness bracket as an ongoing Kaggle competition.
In this presentation its given an introduction about Data Science, Data Scientist role and features, and how Python ecosystem provides great tools for Data Science process (Obtain, Scrub, Explore, Model, Interpret).
For that, an attached IPython Notebook ( http://bit.ly/python4datascience_nb ) exemplifies the full process of a corporate network analysis, using Pandas, Matplotlib, Scikit-learn, Numpy and Scipy.
BioIT Webinar on AI and data methods for drug discoveryFernanda Foertter
Using AI/ML in drug discovery to repurpose new drugs. General cautions about the use of artificial intelligence and general pitfalls and best practices for generating data.
In the machine learning community, we're trained to think of size as inversely proportional to bias, driving us to ever larger datasets, increasingly complex model architectures, and ever better accuracy scores. But bigger doesn't always mean better.
What data quality issues emerge in large datasets? What complications surface as features become more geodistributed (e.g., diurnal patterns, seasonal variations, datetime formatting, multilingual text, etc.)? What happens as models attempt to extrapolate bigger and bigger patterns? Why is it that the pursuit of megamodels has driven a wedge between the ML definition of “bias” and the more colloquial sense of the word?
Perhaps the time has come to move away from monolithic models that reduce rich variations and complexities to a simple argmax on the output layer and instead embrace a new generation of model architectures that are just as organic and diverse as the data they seek to encode.
The document discusses putting "magic" into data science. It provides several tricks or techniques for data science, including collecting novel data sources, dimensionality reduction, Bayesian methods, bootstrapping statistics, and matrix factorizations. It also emphasizes the importance of reliability, latency/interactivity, simplicity/modularity, and unexpectedness to solve the "last mile" problem of getting people to actually use data science tools and models. Specific Facebook tools like Planout, Deltoid, ClustR, Prophet, and Hive/Presto/Scuba are presented as examples.
Enterprise Search: How do we get there from here?Daniel Tunkelang
Enterprise Search: How Do We Get There From Here?
by Daniel Tunkelang (Head of Query Understanding, LinkedIn)
Keynote at 2013 Enterprise Search Summit
We've been tackling the challenges of enterprise and site search for at least 3 decades. We've succeeded to the point that search is the gateway to many of our information repositories. Nonetheless, users of enterprise search systems are frustrated with these systems' shortcomings. We see this frustration in surveys, but, more importantly, most of us experience it personally in our daily work life. We all dream of a world where searching any information repository is as effective as searching the web—perhaps even more so. A world where we find what we're looking for, or quickly determine that it doesn't exist. Is this Utopia possible? If so, how do we get there from here? Or at least somewhere close? In this talk, Tunkelang reviews the track record of enterprise search. He talks about what's worked and what hasn't, especially as compared to web search. Finally, he proposes some paths to bring us closer to our dream.
--
Daniel Tunkelang is Head of Query Understanding at LinkedIn. Educated at MIT and CMU, he has his career working on big data, addressing key challenges in search, data mining, user interfaces, and network analysis. He co-founded enterprise search and business intelligence pioneer Endeca, where he spent a decade as its Chief Scientist. In 2011, Endeca was acquired by Oracle for over $1B. Previous to LinkedIn, he led a team at Google working on local search quality. Daniel has authored fifteen patents, written a textbook on faceted search, and created the annual symposium on human-computer interaction and information retrieval.
This document provides an introduction to machine learning. It begins with an agenda that lists topics such as introduction, theory, top 10 algorithms, recommendations, classification with naive Bayes, linear regression, clustering, principal component analysis, MapReduce, and conclusion. It then discusses what big data is and how data is accumulating at tremendous rates from various sources. It explains the volume, variety, and velocity aspects of big data. The document also provides examples of machine learning applications and discusses extracting insights from data using various algorithms. It discusses issues in machine learning like overfitting and underfitting data and the importance of testing algorithms. The document concludes that machine learning has vast potential but is very difficult to realize that potential as it requires strong mathematics skills.
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
Probabilistic programming is a new approach to machine learning and data science that is currently the focus of intense academic research, including an ongoing DARPA program. If successful, probabilistic programming systems will allow sophisticated predictive models to be written by a wide range of domain experts. Before we get to the promised land, though, some basic challenges need to be addressed, including performance on real-world datasets, programming tools support, and education.
The document discusses the "black box problem" in artificial intelligence and neural networks. Specifically, it notes that while these systems can perform complex tasks, the inner workings and decision-making processes are not fully understood. It argues that developing theoretical frameworks grounded in other domains, like physics, could help increase transparency and interpretability of these technologies. More work is needed to better understand and explain how artificial intelligence systems learn and operate.
Data driven portfolio management agile2017Adam Yuret
This document discusses tools and techniques for data-driven portfolio management in uncertain environments. It addresses how to decide which products to build first using quantitative forecasting methods like Monte Carlo simulation. Various quantitative forecasting techniques are presented, including sampling to determine prediction intervals and using historical data to forecast. The benefits of frequent forecasting using multiple measures are also discussed.
This document summarizes Katherine Lee's midterm project proposal for a customizable plush toy design tool. The tool would allow users to design their own soft creatures or objects by selecting from base shapes, facial features, appendages, and skins. Customization that incorporates a user's feelings or persona can foster a sense of ownership. The document discusses target users, considerations for the design, a development schedule, and plans to launch the tool online and enable ordering of custom creations.
This document discusses exploring biological structures and shapes as building blocks for novel wearable designs. It outlines a concept to deconstruct and abstract biological forms for wearables. It discusses precedents like origami and biology-inspired designs. Prototypes were created and tested with users, who provided feedback on customizability but needing defined constants. The document outlines goals to design a line of 6 wearables using the patterns and create making toolkits, as well as interfaces for building 2D patterns into 3D objects.
This document summarizes Katherine Lee's midterm project proposal for a customizable plush toy design tool. The tool would allow users to design their own soft creatures or objects by selecting from base shapes, facial features, appendages, and skins. Customization that incorporates a user's feelings or persona can foster a sense of ownership. The goal is to create a joyful interactive experience where every choice a user makes feels right. The document discusses target users, considerations for the design, a development schedule, and plans for launching an online tool and community after the initial symposium presentation.
Este documento presenta una guía de lecturas para el curso Finanzas II. Incluye capítulos de libros, artículos y noticias a leer para cada clase sobre temas como valuación de empresas en etapas iniciales, tasas de cambio, riesgo y rentabilidad, mercados de futuros y opciones, y valuación de empresas. La guía cubre 27 clases con material de lectura relevante para cada una.
This document provides an introduction and overview of resources for learning Python for data science. It introduces the presenter, Karlijn Willems, a data science journalist who has worked as a big data developer. It then lists several useful links for learning Python, statistics, machine learning, databases, and data science tools like Apache Spark. Finally, it recommends people to follow in data science and analytics fields.
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
This Edureka Data Science tutorial will help you understand in and out of Data Science with examples. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. Why Data Science?
2. What is Data Science?
3. Who is a Data Scientist?
4. How a Problem is Solved in Data Science?
5. Data Science Components
The talk is on How to become a data scientist. This was at 2ns Annual event of Pune Developer's Community. It focuses on Skill Set required to become data scientist. And also based on who you are what you can be.
This document outlines the steps of the research process. It begins by defining research and the process. The research process consists of 7 steps: 1) defining the research problem, 2) conducting a literature review, 3) generating hypotheses, 4) designing the research, 5) collecting data, 6) analyzing the data, and 7) reaching conclusions and recommendations and reporting the findings. Each step is then further explained in 1-2 sentences.
The document discusses open data science research topics presented at a conference, including opportunities and challenges with learning analytics and adaptive learning using open data. It describes how learning analytics can help achieve large improvements in student outcomes through targeted feedback and personalized learning paths. An open analytics architecture is proposed to integrate different data sources and applications using common data standards.
Siddhant Thakur is a data scientist with over 1 year of experience in machine learning, statistics, and programming projects focused on sports analytics and prediction modeling. His skills include Python, C/C++, SQL, Java, R, and machine learning algorithms. He has worked on projects predicting NFL game winners using random forest classification and clustering medical patients based on lab reports. Currently he is building models to predict the NCAA March Madness bracket as an ongoing Kaggle competition.
In this presentation its given an introduction about Data Science, Data Scientist role and features, and how Python ecosystem provides great tools for Data Science process (Obtain, Scrub, Explore, Model, Interpret).
For that, an attached IPython Notebook ( http://bit.ly/python4datascience_nb ) exemplifies the full process of a corporate network analysis, using Pandas, Matplotlib, Scikit-learn, Numpy and Scipy.
BioIT Webinar on AI and data methods for drug discoveryFernanda Foertter
Using AI/ML in drug discovery to repurpose new drugs. General cautions about the use of artificial intelligence and general pitfalls and best practices for generating data.
In the machine learning community, we're trained to think of size as inversely proportional to bias, driving us to ever larger datasets, increasingly complex model architectures, and ever better accuracy scores. But bigger doesn't always mean better.
What data quality issues emerge in large datasets? What complications surface as features become more geodistributed (e.g., diurnal patterns, seasonal variations, datetime formatting, multilingual text, etc.)? What happens as models attempt to extrapolate bigger and bigger patterns? Why is it that the pursuit of megamodels has driven a wedge between the ML definition of “bias” and the more colloquial sense of the word?
Perhaps the time has come to move away from monolithic models that reduce rich variations and complexities to a simple argmax on the output layer and instead embrace a new generation of model architectures that are just as organic and diverse as the data they seek to encode.
The document discusses putting "magic" into data science. It provides several tricks or techniques for data science, including collecting novel data sources, dimensionality reduction, Bayesian methods, bootstrapping statistics, and matrix factorizations. It also emphasizes the importance of reliability, latency/interactivity, simplicity/modularity, and unexpectedness to solve the "last mile" problem of getting people to actually use data science tools and models. Specific Facebook tools like Planout, Deltoid, ClustR, Prophet, and Hive/Presto/Scuba are presented as examples.
Enterprise Search: How do we get there from here?Daniel Tunkelang
Enterprise Search: How Do We Get There From Here?
by Daniel Tunkelang (Head of Query Understanding, LinkedIn)
Keynote at 2013 Enterprise Search Summit
We've been tackling the challenges of enterprise and site search for at least 3 decades. We've succeeded to the point that search is the gateway to many of our information repositories. Nonetheless, users of enterprise search systems are frustrated with these systems' shortcomings. We see this frustration in surveys, but, more importantly, most of us experience it personally in our daily work life. We all dream of a world where searching any information repository is as effective as searching the web—perhaps even more so. A world where we find what we're looking for, or quickly determine that it doesn't exist. Is this Utopia possible? If so, how do we get there from here? Or at least somewhere close? In this talk, Tunkelang reviews the track record of enterprise search. He talks about what's worked and what hasn't, especially as compared to web search. Finally, he proposes some paths to bring us closer to our dream.
--
Daniel Tunkelang is Head of Query Understanding at LinkedIn. Educated at MIT and CMU, he has his career working on big data, addressing key challenges in search, data mining, user interfaces, and network analysis. He co-founded enterprise search and business intelligence pioneer Endeca, where he spent a decade as its Chief Scientist. In 2011, Endeca was acquired by Oracle for over $1B. Previous to LinkedIn, he led a team at Google working on local search quality. Daniel has authored fifteen patents, written a textbook on faceted search, and created the annual symposium on human-computer interaction and information retrieval.
This document provides an introduction to machine learning. It begins with an agenda that lists topics such as introduction, theory, top 10 algorithms, recommendations, classification with naive Bayes, linear regression, clustering, principal component analysis, MapReduce, and conclusion. It then discusses what big data is and how data is accumulating at tremendous rates from various sources. It explains the volume, variety, and velocity aspects of big data. The document also provides examples of machine learning applications and discusses extracting insights from data using various algorithms. It discusses issues in machine learning like overfitting and underfitting data and the importance of testing algorithms. The document concludes that machine learning has vast potential but is very difficult to realize that potential as it requires strong mathematics skills.
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
Probabilistic programming is a new approach to machine learning and data science that is currently the focus of intense academic research, including an ongoing DARPA program. If successful, probabilistic programming systems will allow sophisticated predictive models to be written by a wide range of domain experts. Before we get to the promised land, though, some basic challenges need to be addressed, including performance on real-world datasets, programming tools support, and education.
The document discusses the "black box problem" in artificial intelligence and neural networks. Specifically, it notes that while these systems can perform complex tasks, the inner workings and decision-making processes are not fully understood. It argues that developing theoretical frameworks grounded in other domains, like physics, could help increase transparency and interpretability of these technologies. More work is needed to better understand and explain how artificial intelligence systems learn and operate.
Data driven portfolio management agile2017Adam Yuret
This document discusses tools and techniques for data-driven portfolio management in uncertain environments. It addresses how to decide which products to build first using quantitative forecasting methods like Monte Carlo simulation. Various quantitative forecasting techniques are presented, including sampling to determine prediction intervals and using historical data to forecast. The benefits of frequent forecasting using multiple measures are also discussed.
This document summarizes Katherine Lee's midterm project proposal for a customizable plush toy design tool. The tool would allow users to design their own soft creatures or objects by selecting from base shapes, facial features, appendages, and skins. Customization that incorporates a user's feelings or persona can foster a sense of ownership. The document discusses target users, considerations for the design, a development schedule, and plans to launch the tool online and enable ordering of custom creations.
This document discusses exploring biological structures and shapes as building blocks for novel wearable designs. It outlines a concept to deconstruct and abstract biological forms for wearables. It discusses precedents like origami and biology-inspired designs. Prototypes were created and tested with users, who provided feedback on customizability but needing defined constants. The document outlines goals to design a line of 6 wearables using the patterns and create making toolkits, as well as interfaces for building 2D patterns into 3D objects.
This document summarizes Katherine Lee's midterm project proposal for a customizable plush toy design tool. The tool would allow users to design their own soft creatures or objects by selecting from base shapes, facial features, appendages, and skins. Customization that incorporates a user's feelings or persona can foster a sense of ownership. The goal is to create a joyful interactive experience where every choice a user makes feels right. The document discusses target users, considerations for the design, a development schedule, and plans for launching an online tool and community after the initial symposium presentation.
Este documento presenta una guía de lecturas para el curso Finanzas II. Incluye capítulos de libros, artículos y noticias a leer para cada clase sobre temas como valuación de empresas en etapas iniciales, tasas de cambio, riesgo y rentabilidad, mercados de futuros y opciones, y valuación de empresas. La guía cubre 27 clases con material de lectura relevante para cada una.
1. Un sistema de computo consta de hardware y software, donde el hardware son los componentes físicos como procesadores, dispositivos de entrada, almacenamiento y salida, y el software controla el funcionamiento de la computadora y permite ejecutar tareas.
2. Cuando un usuario introduce datos a través de un periférico de entrada como el mouse, estos se procesan en la CPU y se muestran en la pantalla a través de la tarjeta de video.
3. Al ejecutar un programa, la CPU rastrea los archivos necesarios en el disco d
This document summarizes the financial performance of a company for the third quarter and first six months of 2007 compared to the same periods in 2006. It shows that net sales increased 8% in the third quarter and 7% for the first six months. Earnings from continuing operations were $92 million in the third quarter and $203 million for the first six months. On a per share basis, diluted earnings from continuing operations were $0.65 per share for the third quarter and $1.41 per share for the first six months. The company's North America segment grew net sales 6% in the third quarter while the International segment grew 17%.
The document discusses the North Carolina State Fair's use of social media including their website, blog, Facebook, Twitter, MySpace, and YouTube channels. It provides statistics on visits, followers, and engagement for each platform for the past year. It also mentions challenges faced and future plans to expand their social media presence and departmental usage.
This document introduces smartWTP, a web-to-print solution that offers several benefits to customers. SmartWTP is a scalable, customizable, and interoperable SaaS solution that eliminates high upfront license fees and the need for specialized IT staff. It allows customers to easily change their business over time. The document describes the storefront module which provides an easy online ordering process for print buyers and visibility and order management for print providers. Additional modules can be added for job creation, prepress workflow automation, and integrating with other systems.
The document discusses the North Carolina State Fair's use of social media including their website, blog, Facebook, Twitter, MySpace, and YouTube channels. It provides statistics on visits, followers, and engagement for each platform showing the fair's growing online presence. It also mentions challenges faced and future plans to expand their social media efforts and influence within the fair industry.
When we ask ourselves why does God make us go through difficult times, we don’t realize the what/where these events may bring us. Only He knows and he will not let us fall. We don’t need to settle for the raw ingredients, trust in Him... And see something fantastic come about!
Liz Claiborne Inc. designs and markets fashion apparel and accessories. It offers products through department stores, specialty stores, and other retail channels in North America, Europe, Asia, Australia, and South America. In 2006, net sales were $4.99 billion and operating income was $436 million. The CEO discusses plans to invest in power brands like Juicy Couture, Kate Spade, and Liz Claiborne through specialty store expansion, marketing initiatives, and advertising. He outlines priorities around irresistible product, building brand loyalty, optimizing the supply chain, and focusing on talent.
The document provides information about registering for and scheduling appointments with the University Center for Excellence in Writing (UCEW) at Florida Atlantic University. It details the services provided, appointment policies including time limits and availability for different groups, and how to register for an account, book in-person and online appointments, modify or cancel appointments, and policies regarding missed appointments.
Este documento presenta los ejes temáticos y preguntas para una Jornada Nacional de Reflexión sobre la educación pública en Chile organizada por el Colegio de Profesores de Chile. Los ejes incluyen el propósito de la educación, una nueva institucionalidad para la educación pública, y una carrera profesional docente. Se proporcionan instrucciones para las discusiones en grupos y se incluyen respuestas de ejemplo a las preguntas planteadas sobre estos temas.
This document is Molson Coors Brewing Company's annual report on Form 10-K for the fiscal year ended December 30, 2007. It provides a summary of Molson Coors' business operations and financial results. Key details include:
- Molson Coors is a leading global brewer formed in 2005 by the merger of Molson Inc. and Adolph Coors Company.
- In 2006, Molson Coors sold a majority stake in its Brazilian subsidiary Kaiser and retained a minority interest, which it later divested.
- Molson Coors has several joint ventures for activities like can manufacturing, bottle manufacturing, and transportation and distribution in various markets.
- In 2007, Mol
The document provides an overview of how to use Twitter, including the basics of tweeting, following/unfollowing others, direct messaging, hashtags, and reading your Twitter stream. It recommends tools for reading Twitter like Tweetdeck and mobile apps. It also discusses following others on Twitter like friends, colleagues, celebrities, and suggests using hashtags to aggregate topics of interest.
Module 1 introduction to machine learningSara Hooker
We believe in building technical capacity all over the world.
We are building and teaching an accessible introduction to machine learning for students passionate about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our work, visit www.deltanalytics.org
SPWK '20 - explaining data science to humans.pptxDoug Hall
The document provides a summary of key data science concepts and techniques explained in a simplified and accessible manner. It covers choosing appropriate models for prediction and classification tasks, such as linear regression and logistic regression. It also discusses important data engineering concepts like data preparation, dimensionality reduction techniques, and handling different data types. Machine learning techniques like supervised and unsupervised learning are explained. Other topics covered include attribution, testing hypotheses and p-values. The overall goal is to demystify advanced data science topics for non-experts.
In:Confidence 2019 - Tools for privacy-aware data analysisPrivitar
Dr Adrià Gascón, Research Fellow at the Alan Turing Institute talks about the main tools for privacy-aware data analysis on the In:Confidence 2019 main stage (April 4th at Printworks, London).
This document provides an overview of data management best practices. It discusses the importance of using consistent naming conventions for files and directories to keep data organized. Metadata, controlled vocabularies, and ontologies are presented as essential tools for documenting and allowing others to understand data in the absence of the original researcher. Standards are highlighted as critical for sharing data across disciplines and over time. A variety of tools and repositories are introduced to help with tasks like version control, formatting data for sharing, and archiving datasets for long-term access and attribution. The document emphasizes that properly managing data from the start helps accelerate discovery and ensures reproducibility of scientific research.
1. The document discusses traits that are important for effective data analysis and visualization. It outlines traits like curiosity, critical thinking, understanding data, attention to detail, learning new technologies, and communicating results clearly.
2. Key traits of meaningful data that enable useful analysis are discussed, such as high volume, being historical, consistent, multivariate, atomic, clean, and dimensionally structured.
3. Visual perception and how the human brain interprets visuals is also covered. For effective data visualization, visuals must be designed based on principles of visual perception so that insights can be easily understood.
Thinkful - Intro to Data Science - Washington DCTJ Stalcup
This document discusses an introductory session on data science. It begins with introductions and an outline of the session's goals, which are to define what a data scientist is, how the field has emerged, and how to become one. It then discusses the growing demand and high salaries for data scientists. Examples are given of how data science has been applied at companies like LinkedIn, Netflix, and for fighting Ebola. Key aspects of data science like big data, Hadoop, MapReduce, and machine learning algorithms are explained. The document concludes by discussing the data science process and tools used, and encourages the audience that it is possible for them to become data scientists with the right knowledge, skills, and learning approach.
Landing your first Data Science Job: The Technical InterviewAnidata
In this talk, Dr Emanuele discusses one of the most intimidating and fateful parts of data science job searches: the technical interview. He discusses all the preparation aspiring and current data scientists should have as part of their routine, and reveals intimate insights behind how he interviews, vets, and hires data scientists in his startup.
The document discusses the process of data science. It begins by defining the typical steps in a data science project as identifying a problem/business question, collecting and cleaning data, performing exploratory data analysis, using algorithms and machine learning, reporting answers/minimum viable products, and getting feedback to review results. It then lists "inconvenient truths" about data science, such as data never being clean and most time being spent on preparation. Finally, it provides an example of using import.io and MonkeyLearn tools for text analysis.
How Do I Get a Job in Data Science? | People Ask Googleprateek kumar
One of the most common questions that aspiring data scientists ask is – ‘how do I get a data science job?’ There are many professionals looking to transition to data science but don’t know how. Therefore, this blog explains how you can get a data science job.
What to Know Before Applying
I want to make one thing clear at the start – getting a data science job is not easy. Sure, there are scores of openings and many companies are looking to hire data scientists so that they can gain an edge over their competitors using data.
This document provides an overview of data science including its importance, what data scientists do, how the field has emerged, and how to become a data scientist. It notes that by 2018 the US could face shortages of people with data analytics skills. It then discusses how LinkedIn's early growth in 2006 exemplifies the data science process of framing questions, collecting and processing data, exploring patterns, and communicating results. Finally, it outlines the tools used in data science like SQL, analytics software, and machine learning and discusses getting started in the field through education, curiosity, and ongoing learning with mentorship support.
User research assets: treasure or trash?
Slides from the NUX6 talk by Kate Towsey, Friday 27th October 2017.
2017.nuxconf.uk / nuxuk.org
Synopsis:
Hours of audio and video, photographs, notes, decks, matrices, various types of visual presentations, and even physical assets picked up in the field – and even walls! We make a lot of stuff in doing user research. Are these things potential treasure troves of knowledge for future research? Or are they things that, once used and acted upon, become more complicated to keep than to trash?
A couple of years ago, I spent several months researching how the GDS user research team and their colleagues felt about the stuff they made during user research. Off the back of that research, we ran a pilot for an A/V library. Then I wrote a blog post about the research results. Since then, I’ve had many conversations with organisations around the world trying to answer the same question: can we make our research assets useful in the long term?
In this talk, I’ll share what I learned at GDS, and what I’ve learned in talking to industry since. It’ll be an open-ended talk, but, I hope, a good starting point for cross-industry knowledge sharing and debate.
This document provides summaries of advice from three data scientists - DJ Patil, Clare Corthell, and Michelangelo D'Agostino - on how to build skills in data science. DJ advises taking an active start by proving you can complete a data science project. Clare took an independent approach to learning by creating her own Open Source Data Science Masters curriculum. For those in graduate school, DJ recommends focusing on building things, not just understanding concepts, and Michelangelo suggests learning skills that are relevant and can be applied in industry.
This document discusses data science and provides some key takeaways. It defines data science as a collaborative field that uses data to solve problems or create new opportunities. It notes that what makes someone a data scientist depends on their role, but often involves learning through MOOCs, self-study, or formal education. The document outlines the iterative process of understanding a business problem, preparing data, exploring the data, modeling and prototyping solutions, and maintaining models. It emphasizes the importance of being curious, judgmental, and argumentative rather than just using tools, understanding why certain approaches are taken, and recognizing that data science is both rewarding and challenging.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
Trusting a Distributed Data Pipeline | Masters of ConversionVWO
Conclusions you reach with data are only valid if they correctly interpret your data set. In many organizations, the responsibility for collecting and aggregating data is distributed, so it can be hard to ensure that everyone who uses a data set understands the limitations of the signals in that pipeline.
As an example, many companies make important decisions about what events constitute an “active user,” and these decisions are reflected in the pipeline code. Changes to a pipeline may not be communicated to all downstream users, leading to misinformed conclusions even from correctly executed analyses.
In this talk, Richard will share three key questions to help ensure that you are interpreting your data correctly and drawing accurate conclusions.
This document provides an overview of data science including its importance, what data scientists do, how the field has emerged, and how to become a data scientist. It discusses how data science can help answer important business questions using LinkedIn in 2006 as a case study. It also outlines the typical data science process of framing questions, collecting and cleaning data, exploring patterns, and communicating results. Finally, it introduces some common data science tools like SQL, analytics software, and machine learning algorithms and discusses options for continuing education in data science.
You've heard the news, Data Science is the cool new career opportunity sweeping the world. Come learn from Thinkful Mentors all about this new and exciting industry.
Data vs Hunch - Beyond Lecture at Hyper Island 2015Beyond
How do you strike a balance between data and creative hunch in a digital marketing world obsessed with metrics and ROI? Slides from a session with the Hyper Island Digital Data Strategy class of 2015, at the school's Stockholm campus.
This document discusses the changing relationship between data, creativity, and marketing in today's digital world. It notes that with the rise of digital, marketing has become more metrics-driven and accountable to ROI. However, it cautions that data cannot provide all the answers and emotions still drive human behavior. The document provides tips for marketers to balance data and creativity, including understanding data's limitations, using multiple data sets, distinguishing where data ends and strategy begins, and allowing for creative leaps beyond just the numbers. It advocates for custom cross-functional teams to develop ideas and applying creativity processes to fully leverage an organization's talent against increasingly complex challenges.
2017 06-14-getting started with data scienceThinkful
The document provides an overview of getting started with a career in data science. It introduces the author Jasjit Singh and discusses what a data scientist does, how the field has emerged to analyze big data. Examples are given of how companies like LinkedIn and Uber use data science. The data science process is explained through the steps of framing a question, collecting and processing data, exploring patterns in the data, and communicating findings. Tools used include SQL, data visualization software, and machine learning algorithms. The document encourages the reader that becoming a data scientist is achievable through learning statistics, algorithms, and software skills.
Similar to Data Science at Scale @ barricade.io (20)
The Artful Business of Data Mining: Computational Statistics with Open Source...David Coallier
This talk goes over a concepts of data mining and data analysis using open source tools, mainly Python and R with interesting libraries and the tools I have used and currently use at Engine Yard.
The document discusses the future of PHP, including new features in PHP 5.4 like namespaces, closures, and traits. It covers emerging technologies like NoSQL databases and PaaS cloud platforms. It emphasizes focusing on users, adopting new technologies, and contributing to open source communities.
The document discusses building applications in the cloud and the benefits this provides including global availability, elasticity, ability to handle increased traffic, and easier updates. It also notes some challenges like components potentially failing and costs that may be higher or lower than expected. The document advocates building independently scalable services that can heal themselves and have deployments integrated into the application.
The state of the PHP world has been most precarious over the past few years and many developers moved over to other languages and other technologies because PHP was lacking something that other emerging techs were providing.
With the rise of cloud computing, cutting edge frameworks and amazing platforms, PHP can be sexy again. This talk aims at giving an idea of how PHP, as a language and a community, evolved over the past few years and how to refocus our energy to solve today's and tomorrow's problems rather than contemplating the success of our past. We have to adapt to change and this talk will help the listeners with the transition by providing then with insight into: Cloud Computing, PaaS, upcoming frameworks as such as Zend Framework2, Symfony2, Lithium, and many more aspects of this rapidly changing software ecosystem.
The document discusses the author's views that cloud alone is lacking, frameworks have issues, and that their company Orchestra aims to be more than just a product by focusing on passion, involvement, and lifestyle while also meeting enterprise needs like SLAs and support. The author believes successfully building a product requires avoiding common mistakes and moving to a platform like EngineYard.
The document discusses Orchestra, a cloud computing platform. It touches on several topics related to Orchestra, including its manifesto, language features, frameworks, projects, tools and passion of its creators. The document also notes that products can be easy to build but also easy to screw up, and that Orchestra aims to address enterprise needs while taking on incumbent providers.
This is a @JSConfEU talk about how we interact with technology and how we should maybe rethink our god-like approach to software languages. It also showcases an experiment that allows developers to interact with PHP directly from Node.js.
FRAPI is an API management panel and developer-facing API that was created to address issues like laziness, performance problems, and the involvement of humans in API development. It allows developers to build APIs that are automatically generated and synchronized based on configurations in the management panel. FRAPI aims to improve performance and reduce complexity by removing unnecessary code and providing tools for authentication, content negotiation, database integration, and documentation generation.
This document discusses FRAPI, a framework for building RESTful APIs. It provides both a management panel and developer-facing API for creating RESTful services. FRAPI aims to solve issues like laziness, performance problems, and long development times. It emphasizes building APIs with performance as a primary goal and without unnecessary complexity or "magic". The document outlines some of FRAPI's key features such as management, authentication, database support, content negotiation, code generation, and documentation. It also provides examples of projects using FRAPI and discusses development considerations.
This is a talk I presented at University Limerick to give people an introduction into CouchDB.
What is it? How does it generally work? Introducing new concepts, etc.
Web 3.0 focuses on semantics, data standards, and understanding through technologies like RDF, OWL, and SPARQL. These standards allow sites and data about the same things to be easily combined and searched. APIs are also important, allowing developers to build applications that access and combine data from different systems and sources, powering mashups, mobile apps, and more. When developing an API, one should focus on RESTful design principles, use meaningful URIs and formats like JSON, and support the developer community through documentation, tutorials, and libraries.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
53. Kafka Queue.
Distributed messaging system
Append-only log
Consumers have offsets
Partition for parallelism
Replicate for redundancy
Message order guaranteed, per-partition