Big data in Texas has experienced significant growth and changes over the past few decades. In the 1990s, Austin was a hub for alternative culture and the internet was exploding, fueling innovation. Since then, data science has emerged as a field and machine learning has grown exponentially. Texas universities have become leaders in data-focused research and applications. Looking ahead, the state is well-positioned for continued data innovation as industry and academia increasingly collaborate around data science.
SOFTELECTRONICTURBO ремонтира и рециклира турбини за камиони.
SOFTELECTRONICTURBO:Сервиз за турбокомпресори ремонтира, рециклира и предлага нови турбини за камиони.
SOFTELECTRONICTURBO ремонтира и рециклира турбини за камиони.
SOFTELECTRONICTURBO:Сервиз за турбокомпресори ремонтира, рециклира и предлага нови турбини за камиони.
We are counted as the known manufacturer, supplier, importer and exporter of a wide range of Precious Stones, Semi Precious Stone and many more. We import our raw materials from relible sources.
Las tecnologías de la información y la comunicación (TIC) son todas aquellas herramientas y programas que tratan, administran, transmiten y comparten la información mediante soportes tecnológicos. La informática, Internet y las telecomunicaciones son las TIC más extendidas, aunque su crecimiento y evolución están haciendo que cada vez surjan cada vez más modelos.
En los últimos años, las TIC han tomado un papel importantísimo en nuestra sociedad y se utilizan en multitud de actividades. Las TIC forman ya parte de la mayoría de sectores: educación, robótica, Administración pública, empleo y empresas, salud…
How To Speak To Them On Their WavelengthGeorge Hutton
http://mindpersuasion.com/ir/
If you speak to anybody on their wavelength, they will be much more likely to go along with your ideas. Luckily, learning how to do this is incredibly easy. Learn How: http://mindpersuasion.com/ir/
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
Strata CA 2018-03-08
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223
Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Strata Singapore 2017 session talk 2017-12-06
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn’t it applicable?
* How do HITL approaches compare/contrast with more “typical” use of Big Data?
* What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time:
* In what ways do the humans involved learn from the machines?
* In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
JupyterCon NY 2017-08-24
https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html
Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies.
The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc.
Jupyter notebooks serve as one part configuration file, one part data sample, one part structured log, one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases.
This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx.
https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859
https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858
Strata UK 2017. Computable content leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based.
We are counted as the known manufacturer, supplier, importer and exporter of a wide range of Precious Stones, Semi Precious Stone and many more. We import our raw materials from relible sources.
Las tecnologías de la información y la comunicación (TIC) son todas aquellas herramientas y programas que tratan, administran, transmiten y comparten la información mediante soportes tecnológicos. La informática, Internet y las telecomunicaciones son las TIC más extendidas, aunque su crecimiento y evolución están haciendo que cada vez surjan cada vez más modelos.
En los últimos años, las TIC han tomado un papel importantísimo en nuestra sociedad y se utilizan en multitud de actividades. Las TIC forman ya parte de la mayoría de sectores: educación, robótica, Administración pública, empleo y empresas, salud…
How To Speak To Them On Their WavelengthGeorge Hutton
http://mindpersuasion.com/ir/
If you speak to anybody on their wavelength, they will be much more likely to go along with your ideas. Luckily, learning how to do this is incredibly easy. Learn How: http://mindpersuasion.com/ir/
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
Strata CA 2018-03-08
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223
Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Strata Singapore 2017 session talk 2017-12-06
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn’t it applicable?
* How do HITL approaches compare/contrast with more “typical” use of Big Data?
* What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time:
* In what ways do the humans involved learn from the machines?
* In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
JupyterCon NY 2017-08-24
https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html
Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies.
The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc.
Jupyter notebooks serve as one part configuration file, one part data sample, one part structured log, one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases.
This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx.
https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859
https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858
Strata UK 2017. Computable content leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based.
See 2020 update: https://derwen.ai/s/h88s
SF Python Meetup, 2017-02-08
https://www.meetup.com/sfpython/events/237153246/
PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.
Use of standards and related issues in predictive analyticsPaco Nathan
My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html
Presented 2015-08-24 at SF Bay ACM, held at the eBay south campus in San Jose.
http://meetup.com/SF-Bay-ACM/events/221693508/
Project Jupiter https://jupyter.org/ evolved from IPython notebooks, and now supports a wide variety of programming language back-ends. Notebooks have proven to be effective tools used in Data Science, providing convenient packages for what Don Knuth coined as "literate programming" in the 1980s: code plus exposition in markdown. Results of running the code appear in-line as interactive graphics -- all packaged as collaborative, web-based documents. Some have said that the introduction of cloud-based notebooks is nearly as large of a fundamental change in software practice as the introduction of spreadsheets.
O'Reilly Media has been considering the question, "What comes after books and video?" Or, as one might imagine more pointedly, what comes after Kindle? To that point we have collaborated with Project Jupyter to integrate notebooks into our content management process, allowing authors to generate articles, tutorials, reports, and other media products as notebooks that also incorporate video segments. Code dependencies are containerized using Docker, and all of the content gets managed in Git repositories. We have added another layer, an open source project called Thebe that provides a kind of "media player" for embedding the containerized notebooks into web pages
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
http://www.meetup.com/Seattle-Data-Science/events/223445403/
Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved.
Microservices, containers, and machine learningPaco Nathan
http://www.oscon.com/open-source-2015/public/schedule/detail/41579
In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities.
Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as:
* What are the trending topic summaries?
* Who are the leaders in the community for various topics?
* Who discusses most frequently with whom?
This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.
https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472
Big Brains meetup hosted by BloomReach, 2015-06-04
Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
"Real-Time Analytics with Spark Streaming" presented at QCon São Paulo, 2015-03-26
http://qconsp.com/presentation/real-time-analytics-spark-streaming
This talk presents an overview of Spark and its history and applications, then focuses on the Spark Streaming component used for real-time analytics. We compare it with earlier frameworks such as MillWheel and Storm, and explore industry motivations for open-source micro-batch streaming at scale.
The talk will include demos for streaming apps that include machine-learning examples. We also consider public case studies of production deployments at scale.
We’ll review the use of open-source sketch algorithms and probabilistic data structures that get leveraged in streaming – for example, the trade-off of 4% error bounds on real-time metrics for two orders of magnitude reduction in required memory footprint of a Spark app.
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
1. “Big Data in Texas:
Then, Now, and Ahead”
Paco Nathan,
Evil Mad Scientist @
Concurrent, Inc.
1
2. Then, Now, and Ahead
THEN
1. Keep Austin Weird?
2. Something Called Data Science
3. Rise Of The Machine Data
4. A Cambrian Explosion
5. Eat, Drink, Be Merry…
6. Data-Driven In TX
7. Roll Up Your Sleeves
2
3. observations…
Lynn asked me to talk about Data here today
A few weeks ago we stepped back for a moment
to reflect about what we’d seen happen in Austin
over the years
Both of us ran alternative bookstores in Austin,
twenty or so years ago, and we participated as
the Internet thing exploded in the 1990s
That was a blast –
3
11. observations…
Overall, it’s about systems thinking
We have a wealth of that here, at UT/Austin in particular…
Ilya Prigogine spent years here, which is just incredible
School of Architecture, with leading work in VR, GIS, etc.
Interactive innovations at ACTLab…
Quantitative emphasis at McCombs…
major intellectual resources here
11
12. Then, Now, and Ahead
NOW
1. Keep Austin Weird?
2. Something Called Data Science
3. Rise Of The Machine Data
4. A Cambrian Explosion
5. Eat, Drink, Be Merry…
6. Data-Driven In TX
7. Roll Up Your Sleeves
12
13. Data Science edoMpUsserD:IUN
tcudorP ylppA lenaP yrotnevnI tneilC
tcudorP evomeR lenaP yrotnevnI tneilC
edoMmooRyM:IUN
edoMmooRcilbuP:IUN
ydduB ddA
nigoL etisbeW
vd
edoMsdneirF:IUN
edoMtahC:IUN
egasseM a evaeL
G1 :gniniamer ecaps sserddA
dekcilCeliforPyM:IUN
edoMstiderCyuB:IUN
tohspanS a ekaT
egapemoH nwO tisiV
elbbuB a epyT
taeS egnahC
business process,
wodniW D3 nepO
Domain dneirF ddA
revO tcudorP pilF lenaP yrotnevnI tneilC
lenaP tidE
Expert
woN tahC
stakeholder
teP yalP
teP deeF
2 petS egaP traC esahcruP edaM remotsuC
M215 :gniniamer ecaps sserddA
gnihtolC no tuP
bew :metI na yuB
edoMeivoM:IUN
ytinummoc ,tneilc :detratS weiV eivoM
teP weN etaerC
data detrats etius tset :tseTytivitcennoC
emag pazyeh dehcnuaL
eciov mooRcilbuP tahC
science egasseM yadhtriB
edoMlairotuT:IUN
ybbol semag dehcnuaL
data prep, discovery,
noitartsigeR euqinU
Data
edoMpUsserD:IUN
tcudorP ylppA lenaP yrotnevnI tneilC
tcudorP evomeR lenaP yrotnevnI tneilC
edoMmooRyM:IUN
edoMmooRcilbuP:IUN
ydduB ddA
nigoL etisbeW
vd
edoMsdneirF:IUN
edoMtahC:IUN
egasseM a evaeL
G1 :gniniamer ecaps sserddA
dekcilCeliforPyM:IUN
edoMstiderCyuB:IUN
tohspanS a ekaT
egapemoH nwO tisiV
elbbuB a epyT
taeS egnahC
dneirF ddA
revO tcudorP pilF lenaP yrotnevnI tneilC
lenaP tidE
woN tahC
teP yalP
teP deeF
2 petS egaP traC esahcruP edaM remotsuC
M215 :gniniamer ecaps sserddA
gnihtolC no tuP
bew :metI na yuB
edoMeivoM:IUN
ytinummoc ,tneilc :detratS weiV eivoM
teP weN etaerC
detrats etius tset :tseTytivitcennoC
emag pazyeh dehcnuaL
eciov mooRcilbuP tahC
egasseM yadhtriB
edoMlairotuT:IUN
ybbol semag dehcnuaL
noitartsigeR euqinU
wodniW D3 nepO
Scientist modeling, etc.
software engineering,
App Dev
automation
Ops
systems engineering,
availability
introduced
capability
13
15. references…
by DJ Patil
Data Jujitsu
O’Reilly, 2012
amazon.com/dp/B008HMN5BE
Building Data Science Teams
O’Reilly, 2011
amazon.com/dp/B005O4U3ZE
15
16. Enterprise Data Workflows
Document
Collection
Scrub
Tokenize
token
M
HashJoin Regex
Left token
GroupBy R
Stop Word token
List
RHS
Count
Word
Count
cascading.org
16
17. Enterprise Data Workflows
Over the past 5+ years, we’ve seen many large-
scale Enterprise production deployments based
on Cascading, Cascalog, Scalding, PyCascading,
Cascading.JRuby, etc.
Enterprise data workflows,
Machine learning at scale,
Big Data…
Why?
amazon.com/dp/1449358721
17
18. Then, Now, and Ahead
NOW
1. Keep Austin Weird?
2. Something Called Data Science
3. Rise Of The Machine Data
4. A Cambrian Explosion
5. Eat, Drink, Be Merry…
6. Data-Driven In TX
7. Roll Up Your Sleeves
18
19. Three broad categories of data
Curt Monash, 2010
dbms2.com/2010/01/17/three-broad-categories-of-data
• Human/Tabular data – human-generated data which fits well into tables/arrays
• Human/Nontabular data – all other data generated by humans
• Machine-Generated data
19
20. Three broad categories of data
Curt Monash, 2010
dbms2.com/2010/01/17/three-broad-categories-of-data
• Human/Tabular data – human-generated data which fits well into tables/arrays
• Human/Nontabular data – all other data generated by humans
• Machine-Generated data
• Adjusted Data – Dr. Don Easterbrook, Senate witness
20
21. Q3 1997: inflection point
Four independent teams were working toward horizontal
scale-out of workflows based on commodity hardware
This effort prepared the way for huge Internet successes
in the 1997 holiday season… AMZN, EBAY, Inktomi
(YHOO Search), then GOOG
MapReduce and the Apache Hadoop open source stack
emerged from this
21
22. Circa 1996: pre- inflection point
Stakeholder Customers
Excel pivot tables
PowerPoint slide decks strategy
BI
Product
Analysts
requirements
SQL Query optimized
Engineering code Web App
result sets
transactions
RDBMS
22
23. Circa 1996: pre- inflection point
Stakeholder Customers
Excel pivot tables
PowerPoint slide decks strategy
“Throw it over the wall”
BI
Product
Analysts
requirements
SQL Query optimized
Engineering code Web App
result sets
transactions
RDBMS
23
24. Circa 2001: post- big ecommerce successes
Stakeholder Product Customers
dashboards UX
Engineering
models servlets
recommenders
Algorithmic + Web Apps
Modeling classifiers
Middleware
aggregation
event
SQL Query history
result sets customer
transactions
Logs
DW ETL RDBMS
24
25. Circa 2001: post- big ecommerce successes
Stakeholder Product Customers
“Data products”
dashboards UX
Engineering
models servlets
recommenders
Algorithmic + Web Apps
Modeling classifiers
Middleware
aggregation
event
SQL Query history
result sets customer
transactions
Logs
DW ETL RDBMS
25
26. Circa 2013: clusters everywhere
Data Products Customers
business
Domain process Prod
Expert Workflow
dashboard
metrics
data
Web Apps, s/w
History services
science Mobile, etc. dev
Data
Scientist
Planner social
discovery interactions
+ optimized transactions,
Eng
modeling taps capacity content
App Dev
Use Cases Across Topologies
Hadoop, Log In-Memory
etc. Events Data Grid
Ops DW Ops
batch near time
Cluster Scheduler
introduced existing
capability SDLC
RDBMS
RDBMS
26
27. Circa 2013: clusters everywhere
Data Products Customers
business
Domain process Prod
Expert Workflow
dashboard
metrics
data
Web Apps, s/w
History services
science Mobile, etc. dev
Data
Scientist
Planner social
discovery interactions
+ optimized transactions,
Eng
modeling taps capacity content
App Dev
“Optimizing topologies”
Use Cases Across Topologies
Hadoop, Log In-Memory
etc. Events Data Grid
Ops DW Ops
batch near time
Cluster Scheduler
introduced existing
capability SDLC
RDBMS
RDBMS
27
28. references…
• Lambda Architecture: blending topologies
• Big Data by Nathan Marz, James Warren
• manning.com/marz
source: Nathan Marz
28
29. references…
by Leo Breiman
Statistical Modeling: The Two Cultures
Statistical Science, 2001
bit.ly/eUTh9L
29
30. references…
Amazon
“Early Amazon: Splitting the website” – Greg Linden
glinden.blogspot.com/2006/02/early-amazon-splitting-website.html
eBay
“The eBay Architecture” – Randy Shoup, Dan Pritchett
addsimplicity.com/adding_simplicity_an_engi/2006/11/you_scaled_your.html
addsimplicity.com.nyud.net:8080/downloads/eBaySDForum2006-11-29.pdf
Inktomi (YHOO Search)
“Inktomi’s Wild Ride” – Erik Brewer (0:05:31 ff)
youtube.com/watch?v=E91oEn1bnXM
Google
“Underneath the Covers at Google” – Jeff Dean (0:06:54 ff)
youtube.com/watch?v=qsan-GQaeyk
perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx
30
31. Then, Now, and Ahead
NOW
1. Keep Austin Weird?
2. Something Called Data Science
3. Rise Of The Machine Data
4. A Cambrian Explosion
5. Eat, Drink, Be Merry…
6. Data-Driven In TX
7. Roll Up Your Sleeves
31
32. Displacement
Geoffrey Moore
Mohr Davidow Ventures, author of Crossing The Chasm
Hadoop Summit, 2012:
what Amazon did to the retail sector… has put the
entire Global 1000 on notice over the next decade
data as the major force… mostly through apps –
verticals, leveraging domain expertise
Michael Stonebraker
INGRES, PostgreSQL,Vertica,VoltDB, Paradigm4, etc.
XLDB, 2012:
complex analytics workloads are now displacing
SQL as the basis for Enterprise apps
32
33. Drivers
algorithmic modeling + machine data
+ curation, metadata + Open Data
data products, as feedback into automation
evolution of feedback loops
a big part of the science in data science…
internet of things + complex analytics
accelerated evolution, additional feedback loops
taking this out into a highly social dimension
33
34. “A kind of Cambrian explosion”
source: National Geographic
34
36. A Thought Exercise
Consider that when a company like Catepillar moves
into data science, they won’t be building the world’s
next search engine or social network
They will most likely be optimizing supply chain,
optimizing fuel costs, automating data feedback
loops integrated into their equipment…
Operations Research –
crunching amazing amounts of data
36
37. A Thought Exercise
That’s a $50B company,
in a market segment worth $250B
Upcoming: tractors as drones –
guided by complex, distributed data apps
37
39. Two Avenues to the App Layer
Enterprise: must contend with
complexity at scale everyday…
incumbents extend current practices and
infrastructure investments
complexity ➞
Start-ups: crave complexity and
scale to become viable…
new ventures move into Enterprise space scale ➞
to compete using relatively lean staff
39
40. Then, Now, and Ahead
AHEAD
1. Keep Austin Weird?
2. Something Called Data Science
3. Rise Of The Machine Data
4. A Cambrian Explosion
5. Eat, Drink, Be Merry…
6. Data-Driven In TX
7. Roll Up Your Sleeves
40
41. For instance…
Let’s drill-down on that intersection of tractors and
crops, as a focus…
Some of the largest use cases for large-scale data
workflows which we encounter are in Agriculture
Here’s a sector which integrates some of those
themes from the Internet of Things, Catepillar,
Climate Corp, etc.
41
42. Data and Agriculture, Ahead
• single largest employer, livelihood for 40% globally
• 500 million small farms worldwide
• most family farmers rely on rain-fed agriculture
• approx $2T agricultural real estate in US alone
• high annual rate of soil depletion
• cycles of flooding, drought, desertification
• high resolution from private satellite networks,
e.g., skyboximaging.com
• SMS networks for “business intelligence” among
family farmers in Ethiopia agrepedia.com
• microfinance, e.g., kiva.org, slowmoney.org
42
43. Data and Agriculture, Ahead
Consider the emerging reality of drone tractors,
guided by satellite feeds, with predictive analytics
accessing remote cloud-based clusters, crunching
data for crops planted per-plot, based on years of
history evaluated in time series analysis
It would be difficult to identify a bigger Big Data
problem in the world
43
44. Data and Agriculture, Ahead
You’ve heard about Peak Oil, Peak Phosphorus?
How about Peak Snow?
In other words, rising variance of snow pack levels,
increasingly earlier peak snow in the mountains…
which stresses the watersheds, infrastructure, etc.,
which in turn stress agriculture, energy, transportation,
financial markets, tax basis, etc.
Jeff Dozier, William Gail
“The Emerging Science of Environmental Applications”
The Fourth Paradigm, 2009
source: J. Dozier, et al., UCSB
44
45. Data and Agriculture, Ahead
Variance in the timing of the water cycle causes
stress on natural resources and infrastructure:
reservoirs, aqueducts, river ways, aquifers, levees,
farm lands, seawater incursion, etc.
Even in the face of so much IoT data looming,
we lack adequate data and modeling of snowpack,
snow melt, runoff, evaporation, water basins, etc.,
to understand the impact of these changes – now
needed to forecast where to change infrastructure
or strategies
There’s not much machine data up in the mountain
peaks, and satellite data only serves so far…
new opportunities for Big Data
source: J. Dozier, et al., UCSB
45
47. Data and Agriculture, Ahead
We can resolve these kinds of
problems; however, solutions
must leverage huge amounts
of data
47
48. Then, Now, and Ahead
AHEAD
1. Keep Austin Weird?
2. Something Called Data Science
3. Rise Of The Machine Data
4. A Cambrian Explosion
5. Eat, Drink, Be Merry…
6. Data-Driven In TX
7. Roll Up Your Sleeves
48
49. Everything’s Bigger in Texas
Agriculture is just one sector, one set of
problems to tackle
We have much, much more here in Texas
For example, Houston is a major center
for Maritime work…
check out:
marinexplore.org
49
50. Everything’s Bigger in Texas
There’s also the not so small matter of the
Energy and Transportation sectors
GE is putting sensors in each and every
wind generator, each and every jet engine –
again, the Internet of Things.
I’ve heard rumors there are a few of those
wind turbines out in West Texas?
50
51. Everything’s Bigger in Texas
Another of the fastest growing use cases we
see for large-scale predictive modeling is in
Telecom
Think about the stream of CDRs, billions of us
bipeds wandering about the planet with our
phones…
Firehose for that makes Twitter look like MySpace!
The value of location services as data products
for local businesses, communities is astounding
51
52. Then, Now, and Ahead
AHEAD
1. Keep Austin Weird?
2. Something Called Data Science
3. Rise Of The Machine Data
4. A Cambrian Explosion
5. Eat, Drink, Be Merry…
6. Data-Driven In TX
7. Roll Up Your Sleeves
52
53. What is needed?
Approximately 80% of the costs for data-related projects
get spent on data preparation – mostly on cleaning up
data quality issues: ETL, log file analysis, etc.
Unfortunately, data-related budgets for many companies tend
to go into frameworks which can only be used after clean up
Most valuable skills:
‣ learn to use programmable tools that prepare data
‣ learn to generate compelling data visualizations
‣ learn to estimate the confidence for reported results
‣ learn to automate work, making analysis repeatable
source: D3
53
54. What else do we need?
• more emphasis on statistical thinking
• not SQL vs. NoSQL, but instead a focus
on apps as the process of structuring data
• multi-disciplinary teams,
not cubicles and silos
• evolving more feedback loops,
to drive more automation
• oddly enough, we need automation
to be able to employ more people
in intelligent, productive ways
• otherwise, we’re left with…
source: Schwa Corporation
54