Building Data Teams
The document discusses building effective data teams. It notes that the shortage of data scientists is expected to reach 190,000 by 2018 due to the fact that 80% of new data scientist jobs are currently unfilled. It emphasizes that data teams require diverse skills, including statistics, engineering, business analysis, and communication. It also outlines principles for effective "street fighting" data science, such as focusing on results, robust evaluation, and checking assumptions. Soft skills are highlighted as important. The document concludes by discussing increasing data maturity and the growth of data science and big data as fields.
What does a data scientist actually do? Here at Good Rebels we wanted to outline a profile of this new profession, with the help of various industry leaders from academia, business and institutions. In short, we concluded that the main tasks of a data scientist are to identify data, transform it when incomplete, categorize it, prepare it for analysis, perform the analysis, visualize the results and communicate them.
Great quotes from historical leading thinkers (Einstein) to contemporary (Tim O’Reilly & DJ Patil) have to say about the power, use and analysis of data.
Big Data v. Small data - Rules to thumb for 2015Visart
Open data, big data, small data - what's the difference? Do you work with data? Small and medium sized businesses are pressured to transform traditional practices into data-driven models. In this presentation, CEO, Ugur Kadakal explains the big data v. small data and the insights we can pull from each for better business intelligence.
Do you work with data, or just like learning about it? Check out our blog on www.Visart.io for data stories and other resources.
What does a data scientist actually do? Here at Good Rebels we wanted to outline a profile of this new profession, with the help of various industry leaders from academia, business and institutions. In short, we concluded that the main tasks of a data scientist are to identify data, transform it when incomplete, categorize it, prepare it for analysis, perform the analysis, visualize the results and communicate them.
Great quotes from historical leading thinkers (Einstein) to contemporary (Tim O’Reilly & DJ Patil) have to say about the power, use and analysis of data.
Big Data v. Small data - Rules to thumb for 2015Visart
Open data, big data, small data - what's the difference? Do you work with data? Small and medium sized businesses are pressured to transform traditional practices into data-driven models. In this presentation, CEO, Ugur Kadakal explains the big data v. small data and the insights we can pull from each for better business intelligence.
Do you work with data, or just like learning about it? Check out our blog on www.Visart.io for data stories and other resources.
Slides for an introductory course on Big Data Tools for Artificial Intelligence. This first set of slides introduces the concept of big data and the current context.
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
The term 'Data Science' was first described in scientific literature about 15 years ago. It started to become a major trend in industry about 7 years ago.
O'Reilly Media surveys the industry extensively each year. In addition we get a good birds-eye view of industry trends through our conference programs and publications, working closely with some of the best practitioners in Data Science.
By now, the field has evolved far beyond its origins eclipsing an earlier generation of Business Intelligence and Data Warehousing approaches. Data Science is moving up, into the business verticals and government spheres of influence where it has true global impact.
This talk considers Data Science trends from the past three years in particular. What is emerging? Which parts are evolving? Which seem cluttered and poised for consolidation or other change?
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-2.html
-“Facts” about NSA/Snowden/Prism
-data classification
-guideline to Safe use of “Cloud”:
-choosing and using Cloud
-open source, alternative cloud services
"Big Data" is term heard more and more in industry – but what does it really mean? There is a vagueness to the term reminiscent of that experienced in the early days of cloud computing. This has led to a number of implications for various industries and enterprises. These range from identifying the actual skills needed to recruit talent to articulating the requirements of a "big data" project. Secondary implications include difficulties in finding solutions that are appropriate to the problems at hand – versus solutions looking for problems. This presentation will take a look at Big Data and offer the audience with some considerations they may use immediately to assess the use of analytics in solving their problems.
The talk begins with an idea of how big "Big Data" can be. This leads to an appreciation of how important "Management Questions" are to assessing analytic needs. The fields of data and analysis have become extremely important and impact nearly all facets of life and business. During the talk we will look at the two pillars of Big Data – Data Warehousing and Predictive Analytics. Then we will explore the open source tools and datasets available to NATO action officers to work in this domain. Use cases relevant to NATO will be explored with the purpose of show where analytics lies hidden within many of the day-to-day problems of enterprises. The presentation will close with a look at the future. Advances in the area of semantic technologies continue. The much acclaimed consultants at Gartner listed Big Data and Semantic Technologies as the first- and third-ranked top technology trends to modernize information management in the coming decade. They note there is an incredible value "locked inside all this ungoverned and underused information." HQ SACT can leverage this powerful analytic approach to capture requirement trends when establishing acquisition strategies, monitor Priority Shortfall Areas, prepare solicitations, and retrieve meaningful data from archives.
BIG DATA: hacking complexity - Digital for BusinessCultura Digitale
La globalizzazione, lo sviluppo delle tecnologie ed il moltiplicarsi delle interazioni sociali virtuali stanno delineando, in maniera sempre più chiara, i tratti di quella che è definita “l’era della complessità”. In tale contesto il volume di dati prodotti dai social networks, dalle reti di sensori intelligenti e dai log dei sistemi informativi aziendali (a cui deve essere aggiunto l'output del processo costante di digitalizzazione della conoscenza) si configura come asset imprescindibile per tutti i settori di attività (i.e. pubblica amministrazione, industria, etc.). D'altra parte un anno di "hype mediatico" attorno al tema ha efficacemente dimostrato come la disponibilità dei dati grezzi non sia fonte diretta di valore: "data don't speak from themselves"! L'intervento mira a sensibilizzare i partecipanti sul tema, sottolineando la necessità immediata di "use cases" e definendo una catena del valore per i (BIG) DATA. In chiusura sarà presentata l'esperienza di Big Dive: corso di formazione per data scientist organizzato a Torino, giunto quest'anno alla sua seconda edizione.
Introduction to Data Science Talk Given to Girl Develop It! Central VA members
Note: some slides had animations in Excel, so unfortunately, the images overlap on the SlideShare version.
The Pew Research Center’s Internet & American Life Project and Elon University’s Imagining the Internet Center asked digital stakeholders to weigh two scenarios for 2020, select the one most likely to evolve, and elaborate on the choice. One sketched out a relatively positive future where Big Data are drawn together in ways that will improve social, political, and economic intelligence. The other expressed the view that Big Data could cause more problems than it solves between now and 2020
Data visualizations and infographics are powerful communication tools to make information easier to understand and more memorable to an audience. Designing to communicate with data can be applied to PowerPoint presentations, websites, reports, blog posts and infographics.
Randy's talk covers common pitfalls and design tips you can use whether you are working with a designer or going to design your own. Top software and websites used by designers will also be shared, including many free tools that anyone can use. You will get many actionable tips and links you can use right away in your own communication.
Major topics included:
- The Science of Infographics
- Data Visualization Design Tips
- Software applications and website design tools
Custom presentation for the Dallas UX Meetup group in March 2015.
Data Science For Social Good: Tackling the Challenge of HomelessnessAnita Luthra
A talk presented at the Champions Leadership Conference Series - leveraging data provided by New York City’s Department of Homeless Services, software vendor Tibco partnered with SumAll.Org to help tackle the societal challenge of homelessness in New York City.
Big Data: Profile and Skills of the Information Professional.Luísa Alvim
10th Qualitative and Quantitative Methods in Libraries QQML 2018.Chania, Creta: ISAST International Society for the Advancement of Science and Technology
Big data characteristics, value chain and challengesMusfiqur Rahman
Abstract—Recently the world is experiencing an deluge of
data from different domains such as telecom, healthcare
and supply chain systems. This growth of data has led to
an explosion, coining the term Big Data. In addition to the
growth in volume, Big Data also exhibits other unique
characteristics, such as velocity and variety. This large
volume, rapidly increasing and verities of data is becoming
the key basis of completion, underpinning new waves of
productivity growth, innovation and customer surplus. Big
Data is about to offer tremendous insight to the
organizations, but the traditional data analysis
architecture is not capable to handle Big Data. Therefore,
it calls for a sophisticated value chain and proper analytics
to unearth the opportunity it holds. This research
identifies the characteristics of Big Data and presents a
sophisticated Big Data value chain as finding of this
research. It also describes the typical challenges of Big
Data, which are required to be solved. As a part of this
research twenty experts from different industries and
academies of Finland were interviewed.
Slides for an introductory course on Big Data Tools for Artificial Intelligence. This first set of slides introduces the concept of big data and the current context.
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
The term 'Data Science' was first described in scientific literature about 15 years ago. It started to become a major trend in industry about 7 years ago.
O'Reilly Media surveys the industry extensively each year. In addition we get a good birds-eye view of industry trends through our conference programs and publications, working closely with some of the best practitioners in Data Science.
By now, the field has evolved far beyond its origins eclipsing an earlier generation of Business Intelligence and Data Warehousing approaches. Data Science is moving up, into the business verticals and government spheres of influence where it has true global impact.
This talk considers Data Science trends from the past three years in particular. What is emerging? Which parts are evolving? Which seem cluttered and poised for consolidation or other change?
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-2.html
-“Facts” about NSA/Snowden/Prism
-data classification
-guideline to Safe use of “Cloud”:
-choosing and using Cloud
-open source, alternative cloud services
"Big Data" is term heard more and more in industry – but what does it really mean? There is a vagueness to the term reminiscent of that experienced in the early days of cloud computing. This has led to a number of implications for various industries and enterprises. These range from identifying the actual skills needed to recruit talent to articulating the requirements of a "big data" project. Secondary implications include difficulties in finding solutions that are appropriate to the problems at hand – versus solutions looking for problems. This presentation will take a look at Big Data and offer the audience with some considerations they may use immediately to assess the use of analytics in solving their problems.
The talk begins with an idea of how big "Big Data" can be. This leads to an appreciation of how important "Management Questions" are to assessing analytic needs. The fields of data and analysis have become extremely important and impact nearly all facets of life and business. During the talk we will look at the two pillars of Big Data – Data Warehousing and Predictive Analytics. Then we will explore the open source tools and datasets available to NATO action officers to work in this domain. Use cases relevant to NATO will be explored with the purpose of show where analytics lies hidden within many of the day-to-day problems of enterprises. The presentation will close with a look at the future. Advances in the area of semantic technologies continue. The much acclaimed consultants at Gartner listed Big Data and Semantic Technologies as the first- and third-ranked top technology trends to modernize information management in the coming decade. They note there is an incredible value "locked inside all this ungoverned and underused information." HQ SACT can leverage this powerful analytic approach to capture requirement trends when establishing acquisition strategies, monitor Priority Shortfall Areas, prepare solicitations, and retrieve meaningful data from archives.
BIG DATA: hacking complexity - Digital for BusinessCultura Digitale
La globalizzazione, lo sviluppo delle tecnologie ed il moltiplicarsi delle interazioni sociali virtuali stanno delineando, in maniera sempre più chiara, i tratti di quella che è definita “l’era della complessità”. In tale contesto il volume di dati prodotti dai social networks, dalle reti di sensori intelligenti e dai log dei sistemi informativi aziendali (a cui deve essere aggiunto l'output del processo costante di digitalizzazione della conoscenza) si configura come asset imprescindibile per tutti i settori di attività (i.e. pubblica amministrazione, industria, etc.). D'altra parte un anno di "hype mediatico" attorno al tema ha efficacemente dimostrato come la disponibilità dei dati grezzi non sia fonte diretta di valore: "data don't speak from themselves"! L'intervento mira a sensibilizzare i partecipanti sul tema, sottolineando la necessità immediata di "use cases" e definendo una catena del valore per i (BIG) DATA. In chiusura sarà presentata l'esperienza di Big Dive: corso di formazione per data scientist organizzato a Torino, giunto quest'anno alla sua seconda edizione.
Introduction to Data Science Talk Given to Girl Develop It! Central VA members
Note: some slides had animations in Excel, so unfortunately, the images overlap on the SlideShare version.
The Pew Research Center’s Internet & American Life Project and Elon University’s Imagining the Internet Center asked digital stakeholders to weigh two scenarios for 2020, select the one most likely to evolve, and elaborate on the choice. One sketched out a relatively positive future where Big Data are drawn together in ways that will improve social, political, and economic intelligence. The other expressed the view that Big Data could cause more problems than it solves between now and 2020
Data visualizations and infographics are powerful communication tools to make information easier to understand and more memorable to an audience. Designing to communicate with data can be applied to PowerPoint presentations, websites, reports, blog posts and infographics.
Randy's talk covers common pitfalls and design tips you can use whether you are working with a designer or going to design your own. Top software and websites used by designers will also be shared, including many free tools that anyone can use. You will get many actionable tips and links you can use right away in your own communication.
Major topics included:
- The Science of Infographics
- Data Visualization Design Tips
- Software applications and website design tools
Custom presentation for the Dallas UX Meetup group in March 2015.
Data Science For Social Good: Tackling the Challenge of HomelessnessAnita Luthra
A talk presented at the Champions Leadership Conference Series - leveraging data provided by New York City’s Department of Homeless Services, software vendor Tibco partnered with SumAll.Org to help tackle the societal challenge of homelessness in New York City.
Big Data: Profile and Skills of the Information Professional.Luísa Alvim
10th Qualitative and Quantitative Methods in Libraries QQML 2018.Chania, Creta: ISAST International Society for the Advancement of Science and Technology
Big data characteristics, value chain and challengesMusfiqur Rahman
Abstract—Recently the world is experiencing an deluge of
data from different domains such as telecom, healthcare
and supply chain systems. This growth of data has led to
an explosion, coining the term Big Data. In addition to the
growth in volume, Big Data also exhibits other unique
characteristics, such as velocity and variety. This large
volume, rapidly increasing and verities of data is becoming
the key basis of completion, underpinning new waves of
productivity growth, innovation and customer surplus. Big
Data is about to offer tremendous insight to the
organizations, but the traditional data analysis
architecture is not capable to handle Big Data. Therefore,
it calls for a sophisticated value chain and proper analytics
to unearth the opportunity it holds. This research
identifies the characteristics of Big Data and presents a
sophisticated Big Data value chain as finding of this
research. It also describes the typical challenges of Big
Data, which are required to be solved. As a part of this
research twenty experts from different industries and
academies of Finland were interviewed.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
3. Big Data - Definition
George Dyson, Strata conference in London
2012:
“the era of big data started when the human
cost of making the decision of throwing
something away became higher than the
machine cost of continuing to store it.”
6. ● 80% of new data scientists jobs are not filled
● By 2018 the US will have a shortage of
190.000 data scientists
Teaching 1 - No Unicorns
Source: http://blogs.wsj.com/cio/2014/02/14/it-takes-teams-to-solve-the-data-scientist-shortage/ &
http://blog.pivotal.io/pivotal/news-2/mckinsey-report-highlights-the-impending-data-scientist-shortage
7. Teaching 1 - No Unicorns
Finding Data Scientist Unicorns…
Source: http://www.forbes.com/sites/danwoods/2012/03/08/hilary-mason-what-is-a-data-scientist/ &
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
8. Teaching 1 - No Unicorns
Source: http://www.oreilly.com/data/free/files/analyzing-the-analyzers.pdf
9. Teaching 1 - No Unicorns
Hacking &
Engineering
Statistics &
Analytics
DATA
TEAM
Business Analysis
& Communication
10. Teaching 2 - Street Fighting DS
*Source: http://de.slideshare.net/pskomoroch/street-fighting-data-science-12072010
11. Teaching 2 - Street Fighting DS
*Source: http://de.slideshare.net/pskomoroch/street-fighting-data-science-12072010
● Focus on results, improvise
● Careful with complex models
(Occam’s Razor)
● Check your assumptions
● Look at your data, literally
● Robust evaluation processes
15. What’s next? - Data Maturity
Data Maturity
Competitive
Advantage
Collect
Data
DWH,
Ad Hoc
Reports
Standard
Reports,
Analytics
AB-Testing,
Growth
Hacking
Predictive Modelling,
Multivariate Analysis
Recsys,Data Products
Data Democratization, Business Science
16. What’s next?
Data Science Big Data
Source: http://www.gartner.com/newsroom/id/2819918,
Gartner Hype Cycle for Emerging Technologies 2014