GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
http://www.meetup.com/Seattle-Data-Science/events/223445403/
Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved.
Microservices, containers, and machine learningPaco Nathan
http://www.oscon.com/open-source-2015/public/schedule/detail/41579
In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities.
Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as:
* What are the trending topic summaries?
* Who are the leaders in the community for various topics?
* Who discusses most frequently with whom?
This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.
Use of standards and related issues in predictive analyticsPaco Nathan
My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
Los Angeles Apache Spark Users Group 2014-12-11 http://meetup.com/Los-Angeles-Apache-Spark-Users-Group/events/218748643/
A look ahead at Spark Streaming in Spark 1.2 and beyond, with case studies, demos, plus an overview of approximation algorithms that are useful for real-time analytics.
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
http://www.meetup.com/Seattle-Data-Science/events/223445403/
Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved.
Microservices, containers, and machine learningPaco Nathan
http://www.oscon.com/open-source-2015/public/schedule/detail/41579
In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities.
Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as:
* What are the trending topic summaries?
* Who are the leaders in the community for various topics?
* Who discusses most frequently with whom?
This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.
Use of standards and related issues in predictive analyticsPaco Nathan
My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
Los Angeles Apache Spark Users Group 2014-12-11 http://meetup.com/Los-Angeles-Apache-Spark-Users-Group/events/218748643/
A look ahead at Spark Streaming in Spark 1.2 and beyond, with case studies, demos, plus an overview of approximation algorithms that are useful for real-time analytics.
How Apache Spark fits into the Big Data landscapePaco Nathan
How Apache Spark fits into the Big Data landscape http://www.meetup.com/Washington-DC-Area-Spark-Interactive/events/217858832/
2014-12-02 in Herndon, VA and sponsored by Raytheon, Tetra Concepts, and MetiStream
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
London Spark Meetup 2014-11-11 @Skimlinks
http://www.meetup.com/Spark-London/events/217362972/
To paraphrase the immortal crooner Don Ho: "Tiny Batches, in the wine, make me happy, make me feel fine." http://youtu.be/mlCiDEXuxxA
Apache Spark provides support for streaming use cases, such as real-time analytics on log files, by leveraging a model called discretized streams (D-Streams). These "micro batch" computations operated on small time intervals, generally from 500 milliseconds up. One major innovation of Spark Streaming is that it leverages a unified engine. In other words, the same business logic can be used across multiple uses cases: streaming, but also interactive, iterative, machine learning, etc.
This talk will compare case studies for production deployments of Spark Streaming, emerging design patterns for integration with popular complementary OSS frameworks, plus some of the more advanced features such as approximation algorithms, and take a look at what's ahead — including the new Python support for Spark Streaming that will be in the upcoming 1.2 release.
Also, let's chat a bit about the new Databricks + O'Reilly developer certification for Apache Spark…
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
"Real-Time Analytics with Spark Streaming" presented at QCon São Paulo, 2015-03-26
http://qconsp.com/presentation/real-time-analytics-spark-streaming
This talk presents an overview of Spark and its history and applications, then focuses on the Spark Streaming component used for real-time analytics. We compare it with earlier frameworks such as MillWheel and Storm, and explore industry motivations for open-source micro-batch streaming at scale.
The talk will include demos for streaming apps that include machine-learning examples. We also consider public case studies of production deployments at scale.
We’ll review the use of open-source sketch algorithms and probabilistic data structures that get leveraged in streaming – for example, the trade-off of 4% error bounds on real-time metrics for two orders of magnitude reduction in required memory footprint of a Spark app.
https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472
Big Brains meetup hosted by BloomReach, 2015-06-04
Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.
Data Science with Spark - Training at SparkSummit (East)Krishna Sankar
Slideset of the training we gave at the Spark Summit East.
Blog : https://doubleclix.wordpress.com/2015/03/25/data-science-with-spark-on-the-databricks-cloud-training-at-sparksummit-east/
Video is posted at Youtube https://www.youtube.com/watch?v=oTOgaMZkBKQ
Microservices, Containers, and Machine LearningPaco Nathan
Session talk for Data Day Texas 2015, showing GraphX and SparkSQL for text analytics and graph analytics of an Apache developer email list -- including an implementation of TextRank in Spark.
Functional programming for optimization problems in Big DataPaco Nathan
Enterprise Data Workflows with Cascading.
Silicon Valley Cloud Computing Meetup talk at Cloud Tech IV, 4/20 2013
http://www.meetup.com/cloudcomputing/events/111082032/
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
In this talk I will introduce you to a Docker container that provides you an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You'll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.
Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database there may be tedious transformations and shuffling around of data to perform large scale analysis.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. There are open source tools like Apache Hadoop and Apache Spark that are providing opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Speakers
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Andy Petrella
Distributed Data Science…
* A genomics use case
* Spark Notebook
* Interactive Distributed Data Science
Distributed Data Science… Pipeline
* Pipeline: productizing Data Science
* Demo of Distributed Pipeline (ADAM, Akka, Cassandra, Parquet, Spark)
* Why Micro Services?
* Painful points:
* Data science is Discontiguous
* Context Lost in Translation
* Solution: Data Fellas’ Agile Data Science Toolkit
Presented 2015-08-24 at SF Bay ACM, held at the eBay south campus in San Jose.
http://meetup.com/SF-Bay-ACM/events/221693508/
Project Jupiter https://jupyter.org/ evolved from IPython notebooks, and now supports a wide variety of programming language back-ends. Notebooks have proven to be effective tools used in Data Science, providing convenient packages for what Don Knuth coined as "literate programming" in the 1980s: code plus exposition in markdown. Results of running the code appear in-line as interactive graphics -- all packaged as collaborative, web-based documents. Some have said that the introduction of cloud-based notebooks is nearly as large of a fundamental change in software practice as the introduction of spreadsheets.
O'Reilly Media has been considering the question, "What comes after books and video?" Or, as one might imagine more pointedly, what comes after Kindle? To that point we have collaborated with Project Jupyter to integrate notebooks into our content management process, allowing authors to generate articles, tutorials, reports, and other media products as notebooks that also incorporate video segments. Code dependencies are containerized using Docker, and all of the content gets managed in Git repositories. We have added another layer, an open source project called Thebe that provides a kind of "media player" for embedding the containerized notebooks into web pages
How Apache Spark fits into the Big Data landscapePaco Nathan
How Apache Spark fits into the Big Data landscape http://www.meetup.com/Washington-DC-Area-Spark-Interactive/events/217858832/
2014-12-02 in Herndon, VA and sponsored by Raytheon, Tetra Concepts, and MetiStream
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
London Spark Meetup 2014-11-11 @Skimlinks
http://www.meetup.com/Spark-London/events/217362972/
To paraphrase the immortal crooner Don Ho: "Tiny Batches, in the wine, make me happy, make me feel fine." http://youtu.be/mlCiDEXuxxA
Apache Spark provides support for streaming use cases, such as real-time analytics on log files, by leveraging a model called discretized streams (D-Streams). These "micro batch" computations operated on small time intervals, generally from 500 milliseconds up. One major innovation of Spark Streaming is that it leverages a unified engine. In other words, the same business logic can be used across multiple uses cases: streaming, but also interactive, iterative, machine learning, etc.
This talk will compare case studies for production deployments of Spark Streaming, emerging design patterns for integration with popular complementary OSS frameworks, plus some of the more advanced features such as approximation algorithms, and take a look at what's ahead — including the new Python support for Spark Streaming that will be in the upcoming 1.2 release.
Also, let's chat a bit about the new Databricks + O'Reilly developer certification for Apache Spark…
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
"Real-Time Analytics with Spark Streaming" presented at QCon São Paulo, 2015-03-26
http://qconsp.com/presentation/real-time-analytics-spark-streaming
This talk presents an overview of Spark and its history and applications, then focuses on the Spark Streaming component used for real-time analytics. We compare it with earlier frameworks such as MillWheel and Storm, and explore industry motivations for open-source micro-batch streaming at scale.
The talk will include demos for streaming apps that include machine-learning examples. We also consider public case studies of production deployments at scale.
We’ll review the use of open-source sketch algorithms and probabilistic data structures that get leveraged in streaming – for example, the trade-off of 4% error bounds on real-time metrics for two orders of magnitude reduction in required memory footprint of a Spark app.
https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472
Big Brains meetup hosted by BloomReach, 2015-06-04
Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.
Data Science with Spark - Training at SparkSummit (East)Krishna Sankar
Slideset of the training we gave at the Spark Summit East.
Blog : https://doubleclix.wordpress.com/2015/03/25/data-science-with-spark-on-the-databricks-cloud-training-at-sparksummit-east/
Video is posted at Youtube https://www.youtube.com/watch?v=oTOgaMZkBKQ
Microservices, Containers, and Machine LearningPaco Nathan
Session talk for Data Day Texas 2015, showing GraphX and SparkSQL for text analytics and graph analytics of an Apache developer email list -- including an implementation of TextRank in Spark.
Functional programming for optimization problems in Big DataPaco Nathan
Enterprise Data Workflows with Cascading.
Silicon Valley Cloud Computing Meetup talk at Cloud Tech IV, 4/20 2013
http://www.meetup.com/cloudcomputing/events/111082032/
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
In this talk I will introduce you to a Docker container that provides you an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You'll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.
Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database there may be tedious transformations and shuffling around of data to perform large scale analysis.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. There are open source tools like Apache Hadoop and Apache Spark that are providing opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Speakers
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Andy Petrella
Distributed Data Science…
* A genomics use case
* Spark Notebook
* Interactive Distributed Data Science
Distributed Data Science… Pipeline
* Pipeline: productizing Data Science
* Demo of Distributed Pipeline (ADAM, Akka, Cassandra, Parquet, Spark)
* Why Micro Services?
* Painful points:
* Data science is Discontiguous
* Context Lost in Translation
* Solution: Data Fellas’ Agile Data Science Toolkit
Presented 2015-08-24 at SF Bay ACM, held at the eBay south campus in San Jose.
http://meetup.com/SF-Bay-ACM/events/221693508/
Project Jupiter https://jupyter.org/ evolved from IPython notebooks, and now supports a wide variety of programming language back-ends. Notebooks have proven to be effective tools used in Data Science, providing convenient packages for what Don Knuth coined as "literate programming" in the 1980s: code plus exposition in markdown. Results of running the code appear in-line as interactive graphics -- all packaged as collaborative, web-based documents. Some have said that the introduction of cloud-based notebooks is nearly as large of a fundamental change in software practice as the introduction of spreadsheets.
O'Reilly Media has been considering the question, "What comes after books and video?" Or, as one might imagine more pointedly, what comes after Kindle? To that point we have collaborated with Project Jupyter to integrate notebooks into our content management process, allowing authors to generate articles, tutorials, reports, and other media products as notebooks that also incorporate video segments. Code dependencies are containerized using Docker, and all of the content gets managed in Git repositories. We have added another layer, an open source project called Thebe that provides a kind of "media player" for embedding the containerized notebooks into web pages
See 2020 update: https://derwen.ai/s/h88s
SF Python Meetup, 2017-02-08
https://www.meetup.com/sfpython/events/237153246/
PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.
Big data & data science challenges and opportunitiesJose Quesada
Even when most companies see the advantages of using more data in their decisions, few actually do. Why is that? A few ideas on challenges and opportunities for (middle-size) companies. Talk audience was an engineering association, where most people represented engineering-centric companies in Germany (often in manufacturing).
Future of data science as a professionJose Quesada
How can you thrive in a future where machine learning has been popular for a few years already?
In this talk, I will give you actionable advice from my experience training serious data scientists at our retreat center in Berlin. You are going to face these pointy, hard questions:
- What is the promise of machine learning? Has it happened yet?
- Is it easy to take advance of machine learning, now that most algorithms are nicely packaged in APIs and libraries?
- How much time should I spend getting good at machine learning? Am I good enough now?
- Are data scientists going to be replaced by algorithms? Are we all?
- Is it easy to hire talent in machine learning after the explosion of MOOCs?
How Apache Spark fits into the Big Data landscapePaco Nathan
Boulder/Denver Spark Meetup, 2014-10-02 @ Datalogix
http://www.meetup.com/Boulder-Denver-Spark-Meetup/events/207581832/
Apache Spark is intended as a general purpose engine that supports combinations of Batch, Streaming, SQL, ML, Graph, etc., for apps written in Scala, Java, Python, Clojure, R, etc.
This talk provides an introduction to Spark — how it provides so much better performance, and why — and then explores how Spark fits into the Big Data landscape — e.g., other systems with which Spark pairs nicely — and why Spark is needed for the work ahead.
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio
Stratio’s Cassandra Lucene Index, derived from Stratio Cassandra, is an open sourced plugin for Apache Cassandra that extends its index functionality to provide near real time search such as ElasticSearch or Solr, including full text search capabilities and free multivariable, geospatial and bitemporal search. It is achieved through an Apache Lucene based implementation of Cassandra secondary indexes, where each node of the cluster indexes its own data. Stratio’s Cassandra indexes are one of the core modules on which Stratio’s BigData platform is based.
Andres de la Peña discusses the recently added geospatial search features in Stratio's Cassandra Lucene index using some Nephila Capital use cases. These new features include indexing complex polygons, nearest neighbour search, and the application of chained geometrical transformations such as bounding box, convex hull, centroid, union, intersection, exclusion and distance buffer.
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerStratio
Spark & Cassandra Use Case at Telefónica CyberSecurity (CBS) Antonio Alcocer antonio@stratio.com Oscar Mendez oscar@stratio.com @omendezsoto #CassandraSummit 2014 1
Multiplaform Solution for Graph DatasourcesStratio
One of the top banks in Europe, needed a system to provide better performance, scaling almost linearly with the increase in information to be analyzed, and allowing to move the processes that were currently being executed in the Host to a Big Data infrastructure. During a year we've worked on a system which is able to provide greater agility, flexibility and simplicity for the user to view information when profiling and is now able to analyze the structure of profile data. It's a powerful way to make online queries to a graph database, which is integrated with Apache Spark and different graph libraries. Basically, we get all the necessary information through Cypher queries which are sent to a Neo4j database.
Using the last Big Data technologies like Spark Dataframe, HDFS, Stratio Intelligence or Stratio Crossdata, we have developed a solution which is able to obtain critical information for multiple datasources like text files o graph databases.
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
Everybody has heard of Big Data, and its promise as the next great frontier for innovation. However, Big Data is neither new nor easily defined. What are the key drivers that make Big Data so critically important today? What is the single idea behind Big Data that promises such game changing outcomes for capable organizations? Who are the skilled talent that deliver Big Data results?
This presentation briefly reviews the opportunities, motivation and trends that are driving Big Data disruption. Data science is introduced as the enabling engine for Big Data transformation via the creation of new Data Products. The data scientist is defined and his tools, workflow and challenges are reviewed. Finally, practical tips are presented for approaching data product development.
Key takeaways include:
- Big Data disruption is driven by four megatrends
- Data is the essential raw material for creating valuable Data Products
- Data scientists are heterogeneous by role & skill set, but share common tools, workflows and challenges
- Data science talent is more important than raw data for Big Data success
These slides are modified from an invited presentation for the Gwinnett Chamber of Commerce on March 18, 2014. An excerpt was presented at the Georgia Pacific Social Media Working Session on March 19, 2014.
A 1015 update to the 2012 "Data Big and Broad" talk - http://www.slideshare.net/jahendler/data-big-and-broad-oxford-2012 - extends coverage, brings more in context of recent "big data" work.
Talk at the World Science Festival at Columbia, June 2, 2017: session on Big Data and Physics: http://www.worldsciencefestival.com/programs/big-data-future-physics/
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...jybufgofasfbkpoovh
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. Descriptive statistics. Inferential statistics. Python Libraries for Data Science.
Introduction for skills seminar on Search and Data Mining, Master of European...Gerben Zaagsma
These are the slides for the introductory lecture that I gave as part of a skills seminar on Search and Data Mining (Luxembourg, 11 December 2014). The slides are rather visual and for the most part don’t include notes, yet I believe the gist of the talk will be clear. At the end links are included for tools, further reading and a link to the exercises we did.
An invited talk in the Big Data session of the Industrial Research Institute meeting in Seattle Washington.
Some notes on how to train data science talent and exploit the fact that the membrane between academia and industry has become more permeable.
"Big Data" is term heard more and more in industry – but what does it really mean? There is a vagueness to the term reminiscent of that experienced in the early days of cloud computing. This has led to a number of implications for various industries and enterprises. These range from identifying the actual skills needed to recruit talent to articulating the requirements of a "big data" project. Secondary implications include difficulties in finding solutions that are appropriate to the problems at hand – versus solutions looking for problems. This presentation will take a look at Big Data and offer the audience with some considerations they may use immediately to assess the use of analytics in solving their problems.
The talk begins with an idea of how big "Big Data" can be. This leads to an appreciation of how important "Management Questions" are to assessing analytic needs. The fields of data and analysis have become extremely important and impact nearly all facets of life and business. During the talk we will look at the two pillars of Big Data – Data Warehousing and Predictive Analytics. Then we will explore the open source tools and datasets available to NATO action officers to work in this domain. Use cases relevant to NATO will be explored with the purpose of show where analytics lies hidden within many of the day-to-day problems of enterprises. The presentation will close with a look at the future. Advances in the area of semantic technologies continue. The much acclaimed consultants at Gartner listed Big Data and Semantic Technologies as the first- and third-ranked top technology trends to modernize information management in the coming decade. They note there is an incredible value "locked inside all this ungoverned and underused information." HQ SACT can leverage this powerful analytic approach to capture requirement trends when establishing acquisition strategies, monitor Priority Shortfall Areas, prepare solicitations, and retrieve meaningful data from archives.
Data Science Institutes : kelly technologies is the best Data Science Training Institutes in Hyderabad. Providing Data Science training by real time faculty in Hyderabad.
Una breve introduzione alla data science e al machine learning con un'enfasi sugli scenari applicativi, da quelli tradizionali a quelli più innovativi. La overview copre la definizione di base di data science, una overview del machine learning e esempi su scenari tradizionali, Recommender systems e Social Network Analysis, IoT e Deep Learning
Data Science Innovations : Democratisation of Data and Data Science suresh sood
Data Science Innovations : Democratisation of Data and Data Science covers the opportunity of citizen data science lying at the convergence of natural language generation and discoveries in data made by the professions, not data scientists.
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
Strata CA 2018-03-08
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223
Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Strata Singapore 2017 session talk 2017-12-06
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn’t it applicable?
* How do HITL approaches compare/contrast with more “typical” use of Big Data?
* What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time:
* In what ways do the humans involved learn from the machines?
* In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
JupyterCon NY 2017-08-24
https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html
Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies.
The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc.
Jupyter notebooks serve as one part configuration file, one part data sample, one part structured log, one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases.
This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx.
https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859
https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858
Strata UK 2017. Computable content leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Data Science in 2016: Moving Up
1. Data Science in 2016:
Moving Up
2015-10-15 • Madrid • http://bigdataspain.org/
Paco Nathan, @pacoid
O’Reilly Media
2. • general patterns
• trends and analysis: the discipline, the jobs
• some good examples: moving up into use cases
• glimpses ahead: an emerging content
• a proposed theme
Data Science 2016: Moving Up
8. Design Patterns: Issues
some cloud
• integration could be better
• that implies sharing markets
• VCs in SiliconValley dislike that
• customers need integration
14. Design Patterns: Where?
some cloud
• that playing field becomes
overly crowded, soon…
• what happens at that point?
15. • so much emphasis on plumbing: `data engineering`
• not enough on domain expertise, which trumps all
Much activity in Big Data seems awkwardly focused at the
bottom of the tech stack: infrastructure, not domain
However, that may be changing…
Design Patterns: Opinion
17. Interesting Trends
There are many possible trends to discuss, but let’s
concentrate on four of these going into 2016:
• leveraging multicore and large memory spaces
• generalized libraries for frequently repeated work
• workflows blend the best of people and computing
• framework for a big leap ahead, not just incremental
18. Original definitions for what became relational
databases had less to do with dedicated SQL
products, more similarity with something like
Spark SQL
Interesting Trend #1: Contemporary Hardware
A relational model of data
for large shared data banks
Edgar Codd
Communications of the ACM (1970)
dl.acm.org/citation.cfm?id=362685
19. Python Java/Scala RSQL …
DataFrame
Logical Plan
LLVMJVM GPU NVRAM
Unified API, One Engine, Automatically Optimized
Tungsten
backend
language
frontend
…
from Databricks
Interesting Trend #1: Contemporary Hardware
20. Deep Dive into ProjectTungsten:
Bringing Spark Closer to Bare Metal
Josh Rosen
spark-summit.org/2015/events/deep-dive-into-project-
tungsten-bringing-spark-closer-to-bare-metal/
Set Footer from Insert Dropdown Menu
Physical Execution:
CPU Efficient Data Structures
Keep data closure to CPU cache
Interesting Trend #1: Contemporary Hardware
from Databricks
21. Interesting Trend #2: Generalized Libraries
Tensors are a good way to handle time-series
geo-spatially distributed linked data with lots
of N-dimensional attributes
In other words, nearly a general case for handling
much of the data that we’re likely to encounter
That’s better than attempting to shoehorn data
into matrix representation, then writing lots of
custom code to support it
22. Tensor factorization may be problematic, but
probabilistic solutions seem to provide relatively
general case solutions:
TheTensor Renaissance in Data Science
Anima Anandkumar @UC Irvine
radar.oreilly.com/2015/05/the-tensor-
renaissance-in-data-science.html
Spacey RandomWalks and
Higher Order Markov Chains
David Gleich @Purdue
slideshare.net/dgleich/spacey-random-
walks-and-higher-order-markov-chains
Interesting Trend #2: Generalized Libraries
23. Interesting Trend #3: Leveraging Workflows
evaluationoptimizationrepresentationcirca 2010
ETL into
cluster/cloud
data
data
visualize,
reporting
Data
Prep
Features
Learners,
Parameters
Unsupervised
Learning
Explore
train set
test set
models
Evaluate
Optimize
Scoring
production
data
use
cases
data pipelines
actionable results
decisions, feedback
bar developers
foo algorithms
APIs, algorithms, developer-centric template thinking –
these only go so far; the overall context is a workflow…
26. Chris Ré
https://www.macfound.org/fellows/943/
Drugs, DNA, and Dinosaurs: Building High Quality
Knowledge Bases with DeepDive
Strata CA (2015)
TheThorn in the Side of Big Data: too few artists
Strata CA (2014)
Interesting Trend #4: A Leap Ahead
cognitive computing “flywheel”:
probabilistic reasoning about complex
data and predictions together
27. Chris Ré
https://www.macfound.org/fellows/943/
Drugs, DNA, and Dinosaurs: Building High Quality
Knowledge Bases with DeepDive
Strata CA (2015)
TheThorn in the Side of Big Data: too few artists
Strata CA (2014)
Interesting Trend #4: A Leap Ahead
29. William Cleveland
“Data Science: an Action Plan for Expanding
the Technical Areas of the Field of Statistics,”
International Statistical Review (2001), 69, 21-26
http://www.stat.purdue.edu/~wsc/papers/
datascience.pdf
Leo Breiman
“Statistical modeling: the two cultures”,
Statistical Science (2001), 16:199-231
http://projecteuclid.org/euclid.ss/1009213726
…also good to mention John Tukey
Data Scientists: Primary Sources
31. One 2015 report (RJMetrics) tallied a minimum of
11,400 data scientists worldwide by scraping LinkedIn
So many suddenly, really? Perhaps that’s doubtful…
Comparing surveys: O’Reilly Media conducts salary surveys
for data scientists, along with exploring about the tools used
2013 – tools, trends, not all data is “Big”, coding scripts!
2014 – correlation of tools and skills, rapid evolution
2015 – divide blurring between open source and proprietary
Data Scientists: Everywhere, all the time?
36. Moving Up: Medicine
“Whatever the models might discover or predict, Howard
isn’t suggesting they’ll do away with a doctor’s judgment.
Rather, artificially intelligent computers could provide strong,
unbiased second opinions, or perhaps lead a doctor down
a path of investigation she other wouldn’t have considered.”
With Enlitic, a veteran data scientist plans
to fight disease using deep learning
GigaOM (2014-08-22)
https://gigaom.com/2014/08/22/with-enlitic-a-veteran-
data-scientist-plans-to-fight-disease-using-deep-learning/
37. Moving Up: Political Platform
http://www.predikon.ch/en/voting-patterns/residents
38. Moving Up: Political Platform
Mining Democracy
Matthias Grossglauser @EPFL
ICT Labs (2015)
http://ictlabs-summer-school.sics.se/
slides/mining%20democracy.pdf
What if a political candidate could cluster political
positions in a multi-dimensional data space, to
optimize for being recommended to voters?
http://www.predikon.ch/en/voting-patterns/residents
39. Moving Up: Government Ethics
TheWhite House has a plan to help society through data analysis
Fortune (2018-09-30)
http://fortune.com/2015/09/30/dj-patil-white-house-data/
40. Moving Up: Government Ethics
TheWhite House has a plan to help society through data analysis
Fortune (2018-09-30)
http://fortune.com/2015/09/30/dj-patil-white-house-data/
“Opening up government data about child labor to concerned data
scientists; recruiting folks to help analyze data about suicide prevention,
social injustice and incarceration; a call for mandatory and `intrinsic`
ethics instruction in every course teaching students data science; and an
effort to help the transgender community create its own census of sorts,
so that members and society can get a better grasp on the issues that
matter to the group.”
41. Moving Up: Neuroscience
Analytics +Visualization for Neuroscience:
Spark,Thunder, Lightning
Jeremy Freeman
2015-01-29
youtu.be/cBQm4LhHn9g?t=28m55s
42. For excellent examples of Science and Data
together see CodeNeuro, particularly for
use of Jupyter notebooks + Apache Spark
Moving Up: Neuroscience
45. Massive Open Online Courses –
seven year trend, beginning with:
Connectivism and Connective Knowledge
George Siemens, Stephen Downes
University of PEI (2008)
http://cck11.mooc.ca/
Learning: What About MOOCs?
Adios EdTech. Hola something else
George Siemens (2015-09-09)
http://www.elearnspace.org/blog/2015/09/09/
adios-ed-tech-hola-something-else/
46. Online education: MOOCs taken by educated few
Ezekiel Emanuel, Nature 503, 342 (2013-11-21)
• 80% students already have an advanced degree
• 80% come from the richest 6% of the population
Michael Shanks @Stanford: “retrenchment around traditional
disciplines will make disparities even more pronounced”
An Early Report Card on Massive Open Online Courses
Geoffrey Fowler, WSJ (2013-10-08)
Amherst, Duke, etc., have rejected edX
Learning: What About MOOCs?
47. Online education: MOOCs taken by educated few
Ezekiel Emanuel
• 80% students already have an advanced degree
• 80% come from the richest 6% of the population
Michael Shanks
disciplines will make disparities even more pronounced”
An Early Report Card on Massive Open Online Courses
Geoffrey Fowler
Amhers
Learning: What About MOOCs?
So then, what else works better?
48. How to Flip a Class
CTL @UT/Austin
http://ctl.utexas.edu/teaching/flipping-a-class/how
1. identify where the flipped classroom model makes
the most sense for your course
2. spend class time engaging students in application
activities with feedback
3. clarify connections between inside and outside
of class learning
4. adapt your materials for students to acquire course
content in preparation of class
5. extend learning beyond class through individual
and collaborative practice
Learning: Inverted Classroom
49. Scalable Learning
David Black-Schaffer @Uppsala
Sverker Janson @KTH SICS
https://www.scalable-learning.com/
• active learning: Flipped Classroom and Just-in-timeTeaching
• exams built directly into specific diagrams within videos
• metrics for where in video+code that students get stuck
• instructor can customize subsequent classroom discussions
(active teaching phase) based on stuck/unstuck metrics
Learning: Inverted Classroom
50. Learning programming at scale
Philip Guo
O’Reilly Radar (2015-08-13)
http://radar.oreilly.com/2015/08/learning-
programming-at-scale.html
• PythonTutor
• Codechella
Tutors could keep an eye on around
50 learners during a 30-minute session,
start 12 chat conversations, and
concurrently help 3 learners at once
Learning: Collaborative Learning
51. Data-driven Education and the Quantified Student
Lorena Barba @GWU
PyData Seattle (2015)
https://youtu.be/2YIZ2SY9mW4
• keynote talk: abstract, slides
• homepage
• Open edX Universities Symposium, DC 2015-11-11
Learning: If you study just one link from this talk…
52. If by some bizarre chance you haven’t used
it already, go to https://jupyter.org/
• 50+ different language kernels
• new funding 2015-07
• UC Berkeley, Cal Poly
• nbgrader autograder by Jess Hamrick
• jupyterhub multi-user server
• curating a list of examples
• repeatable science!
see also:
Teaching with Jupyter Notebooks
http://tinyurl.com/scipy2015-education
Learning: Jupyter Project
53. Embracing Jupyter Notebooks at O'Reilly
Andrew Odewahn
O’Reilly Media (2015-05-07)
https://beta.oreilly.com/ideas/jupyter-at-oreilly
O’Reilly Media is using our Atlas platform
to make Jupyter Notebooks a first class
authoring environment for our publishing
program
Jupyter, Thebe, Atlas, Docker, etc.
Learning: O’Reilly Media
56. Is it possible to measure “distance” between
a learner and a subject community?
From Amateurs to Connoisseurs:
Modeling the Evolution of User
Expertise through Online Reviews
Julian McAuley, Jure Leskovec
http://i.stanford.edu/~julian/pdfs/www13.pdf
Learning: Machine Learning about People Learning
57. Learning,Assessment,Team Building, Diversity –
these can be accomplished together, in situ
Collective Intelligence in Human Groups
Anita Williams Woolley @CMU
https://youtu.be/Bz1dDiW2mvM
• balance of participation (no one dominates)
• 2+ women engaging within the group
• group size < 9
• diversity of formal backgrounds
Learning: Machine Learning about People Learning
59. Data Science teams apply machine learning (automation)
to help arrive at key insights, to learn what is important
in data sets – finding the proverbial needle in the haystack
Cognitive Computing exhibits people + automation
as a process, in a learning context
That’s also a basic tenet of workflows in general:
people + automation
And a key aspect of the emerging gig economy too…
People + Automation
61. People + Automation: Gig Economy
http://orchestra.unlimitedlabs.com/
“Workflows with humans and machines”
62. People + Automation: Gig Economy
Workers in aWorld of Continuous Partial Employment
Tim O’Reilly
Medium (2015-08-31)
https://medium.com/the-wtf-economy/workers-in-a-
world-of-continuous-partial-employment-4d7b53f18f96
http://conferences.oreilly.com/next-economy
63. Learning is key. Effective use of Data Science in these new
economic conditions requires people + automation, learning
together – albeit in different ways. Plus, there’s an excellent
framework for that:
Autopoiesis and Cognition
Humberto Maturana, FranciscoVarela
Springer (1973)
https://books.google.es/books?id=nVmcN9Ja68kC
People + Automation
64. I’d like to leave this as a theme for you to consider about
Data Science 2016, Moving Up into use cases…
We see an intersection of key points in both the emerging
Cognitive Computing context and the Gig Economy in general:
systems of people + automation, learning together
It posits an interesting duality for use to leverage
With that I wish you a great conference here at Big Data Spain!
People + Automation