This document discusses using Sparkling Water to build machine learning applications with H2O.ai. It provides an example of using Sparkling Water and H2O algorithms like Word2Vec and GBM to classify tech news articles based on short summaries. The analysis shows author information is an important variable for classification, but using only text blurbs still achieves around 70% accuracy on the second classification attempt. Potential applications of this type of text classification are suggested.
DMAP: Data Aggregation and Presentation FrameworkParang Saraf
DMAP (Data Mining and Automation for Platforms) is an online framework that presents a wide variety of official data, news and information about companies. It was developed with an aim to act as a one-stop platform for displaying everything official related to a company and its competitors. It aggregates data from the following sources: Bing News, Companys' official blogs, RSS Sources, Facebook, Twitter, Google Trends, Crunchbase, Financial Data Sources, and Alexa Web Analytics.
The aggregated information is shown in an intuitive fashion that allows a user to perform exploratory analysis of a particular company. Specifically, the interface makes it easy to do comparative analysis of a company with respect to its competitors. The user can use filters to select a given set of companies, data sources and date ranges. All the information is presented within the context of the framework such that the user doesn't have to go to different domains. The fetched news articles are cleaned, enriched, and presented in a manner that allows an analyst to navigate through news articles using named entities. For each news article, named entities: people, location, organization, and name of companies are extracted. For a given search query, the interface returns matching news articles along with associated entities which are shown using word clouds. This allows for easy discovery of connections between entities.
The framework was developed as a part of the 2015 summer internship with The Center for Global Enterprise (CGE). Conceptualization, Design and Development of the framework was done by me during the three month period.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
In this paper we compare two alternate machine-learning techniques from the Apache Mahout stable, namely: Apache Sparks’, spark-item similarity, and its counterpart Apache Hadoop’s MapReduce. We compare these both qualitatively as well as quantitatively in the context of two e-commerce stores with different behavior to determine which one is more effective and efficient in a given context.
DMAP: Data Aggregation and Presentation FrameworkParang Saraf
DMAP (Data Mining and Automation for Platforms) is an online framework that presents a wide variety of official data, news and information about companies. It was developed with an aim to act as a one-stop platform for displaying everything official related to a company and its competitors. It aggregates data from the following sources: Bing News, Companys' official blogs, RSS Sources, Facebook, Twitter, Google Trends, Crunchbase, Financial Data Sources, and Alexa Web Analytics.
The aggregated information is shown in an intuitive fashion that allows a user to perform exploratory analysis of a particular company. Specifically, the interface makes it easy to do comparative analysis of a company with respect to its competitors. The user can use filters to select a given set of companies, data sources and date ranges. All the information is presented within the context of the framework such that the user doesn't have to go to different domains. The fetched news articles are cleaned, enriched, and presented in a manner that allows an analyst to navigate through news articles using named entities. For each news article, named entities: people, location, organization, and name of companies are extracted. For a given search query, the interface returns matching news articles along with associated entities which are shown using word clouds. This allows for easy discovery of connections between entities.
The framework was developed as a part of the 2015 summer internship with The Center for Global Enterprise (CGE). Conceptualization, Design and Development of the framework was done by me during the three month period.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
In this paper we compare two alternate machine-learning techniques from the Apache Mahout stable, namely: Apache Sparks’, spark-item similarity, and its counterpart Apache Hadoop’s MapReduce. We compare these both qualitatively as well as quantitatively in the context of two e-commerce stores with different behavior to determine which one is more effective and efficient in a given context.
Presentation on how traditional business models for news do not work online - and the possible alternatives being explored, including freemium models, selling services, and events.
The days of purely ad-supported media are numbered. Declining print circulation and dirt-cheap online CPMs (not to mention ad blockers) are precipitating a new model for news organizations. And it’s a good one, much more dynamic, but also more stable, with more evenly distributed sources of revenue. This presentation offers a better understanding of the trends so that viewers may come away with hands-on strategies for identifying and selecting the revenue models that best match your current and future reader base.
The future of media and news monitoring (Futurist Speaker Gerd Leonhard at FI...Gerd Leonhard
This is an edited version of my presentation at http://www.fibep.info/fibep/en/2012CongressProfessionalProgramme.php on the future of media, news, data, social media - and media monitoring technologies and business models
Hyperlocal 101: Part One, 10 hyperlocal business modelsDamian Radcliffe
A short presentation giving ten examples of different hyperlocal business models being used by start-ups and traditional media (mostly from the UK). Please feel free to add other examples as this list is by no means exhaustive.
The Anatomy of the Corporate Content Team: 5 Models to Inspire Your Team's St...HubSpot
How do most corporations structure their content teams? What can you learn from them?
In this presentation you'll learn about the five common content production models employed by old media, new media, brands, and agencies so that you can choose a content team structure for your organization based on your business model, content goals, and available resources.
Check out the associated blog post for more info: http://hubs.ly/y0fT2s0
Startup Secrets - Game Changing Business ModelsMichael Skok
In our industry, it’s not uncommon for entrepreneurs to become so mono-focused on the novelty of their product that they forget to innovate sufficiently around their business model. A disruptive business model can be at least as important as a discontinuous innovation.
The following report is based on a DBMS of an online Art Gallery Online Shopping Store made using Django (front-end) and MYSQL_8.0 (for database storage).
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptxProductdata Scrape
Learn how to scrape Amazons Best Seller lists using Python and BeautifulSoup. Extract rankings, product details, and insights to make data-driven decisions.
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdfProductdata Scrape
Learn how to scrape Amazons Best Seller lists using Python and BeautifulSoup. Extract rankings, product details, and insights to make data-driven decisions.
‘CodeAliker’ - Plagiarism Detection on the Cloud acijjournal
Plagiarism is a burning problem that academics have been facing in all of the varied levels of the educational system. With the advent of digital content, the challenge to ensure the integrity of academic work has been amplified. This paper discusses on defining a precise definition of plagiarized computer code, various solutions available for detecting plagiarism and building a cloud platform for plagiarism disclosure.
‘CodeAliker’, our application thus developed automates the submission of assignments and the review process associated for essay text as well as computer code. It has been made available under the GNU’s General Public License as a Free and Open Source Software.
Case Study: Putting The Watson Developer Cloud to Work - by Doron Katz & Luci...Carlos Tomas
See the source code and thinking behind NewsMedia
Inc.’s (a fictitious company) implementation of the deep
learning and natural language processing APIs available
on Watson Developer Cloud.
Introduction To Angular.js - SpringPeopleSpringPeople
Angular.JS is quickly becoming the dominant JavaScript framework for professional web development. This quick 3 minute introduction will brief you about Angular.js
Presentation on how traditional business models for news do not work online - and the possible alternatives being explored, including freemium models, selling services, and events.
The days of purely ad-supported media are numbered. Declining print circulation and dirt-cheap online CPMs (not to mention ad blockers) are precipitating a new model for news organizations. And it’s a good one, much more dynamic, but also more stable, with more evenly distributed sources of revenue. This presentation offers a better understanding of the trends so that viewers may come away with hands-on strategies for identifying and selecting the revenue models that best match your current and future reader base.
The future of media and news monitoring (Futurist Speaker Gerd Leonhard at FI...Gerd Leonhard
This is an edited version of my presentation at http://www.fibep.info/fibep/en/2012CongressProfessionalProgramme.php on the future of media, news, data, social media - and media monitoring technologies and business models
Hyperlocal 101: Part One, 10 hyperlocal business modelsDamian Radcliffe
A short presentation giving ten examples of different hyperlocal business models being used by start-ups and traditional media (mostly from the UK). Please feel free to add other examples as this list is by no means exhaustive.
The Anatomy of the Corporate Content Team: 5 Models to Inspire Your Team's St...HubSpot
How do most corporations structure their content teams? What can you learn from them?
In this presentation you'll learn about the five common content production models employed by old media, new media, brands, and agencies so that you can choose a content team structure for your organization based on your business model, content goals, and available resources.
Check out the associated blog post for more info: http://hubs.ly/y0fT2s0
Startup Secrets - Game Changing Business ModelsMichael Skok
In our industry, it’s not uncommon for entrepreneurs to become so mono-focused on the novelty of their product that they forget to innovate sufficiently around their business model. A disruptive business model can be at least as important as a discontinuous innovation.
The following report is based on a DBMS of an online Art Gallery Online Shopping Store made using Django (front-end) and MYSQL_8.0 (for database storage).
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptxProductdata Scrape
Learn how to scrape Amazons Best Seller lists using Python and BeautifulSoup. Extract rankings, product details, and insights to make data-driven decisions.
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdfProductdata Scrape
Learn how to scrape Amazons Best Seller lists using Python and BeautifulSoup. Extract rankings, product details, and insights to make data-driven decisions.
‘CodeAliker’ - Plagiarism Detection on the Cloud acijjournal
Plagiarism is a burning problem that academics have been facing in all of the varied levels of the educational system. With the advent of digital content, the challenge to ensure the integrity of academic work has been amplified. This paper discusses on defining a precise definition of plagiarized computer code, various solutions available for detecting plagiarism and building a cloud platform for plagiarism disclosure.
‘CodeAliker’, our application thus developed automates the submission of assignments and the review process associated for essay text as well as computer code. It has been made available under the GNU’s General Public License as a Free and Open Source Software.
Case Study: Putting The Watson Developer Cloud to Work - by Doron Katz & Luci...Carlos Tomas
See the source code and thinking behind NewsMedia
Inc.’s (a fictitious company) implementation of the deep
learning and natural language processing APIs available
on Watson Developer Cloud.
Introduction To Angular.js - SpringPeopleSpringPeople
Angular.JS is quickly becoming the dominant JavaScript framework for professional web development. This quick 3 minute introduction will brief you about Angular.js
Presentation of codeigniter to understand the framework and easy to understand for beginners.Codeigniter is php framework easy to learn and useful for start into web devlopment.
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
Sandeep Singh, Head of Applied AI Computer Vision, Beans.ai
H2O Open Source GenAI World SF 2023
In the modern era of machine learning, leveraging both open-source and closed-source solutions has become paramount for achieving cutting-edge results. This talk delves into the intricacies of seamlessly integrating open-source Large Language Model (LLM) solutions like Vicuna, Falcon, and Llama with industry giants such as ChatGPT and Google's Palm. As the demand for fine-tuned and specialized datasets grows, it is imperative to understand the synergy between these tools. Attendees will gain insights into best practices for building and enriching datasets tailored for fine-tuning tasks, ensuring that their LLM projects are both robust and efficient. Through real-world examples and hands-on demonstrations, this talk will equip attendees with the knowledge to harness the power of both open and closed-source tools in a coherent and effective manner.
Patrick Hall, Professor, AI Risk Management, The George Washington University
H2O Open Source GenAI World SF 2023
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you!
Dr. Alexy Khrabrov, Open Source Science Community Director, IBM
H2O Open Source GenAI World SF 2023
In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.
Michelle Tanco, Head of Product, H2O.ai
H2O Open Source GenAI World SF 2023
Learn how the makers at H2O.ai are building internal tools to solve real use cases using H2O Wave and h2oGPT. We will walk through an end-to-end use case and discuss how to incorporate business rules and generated content to rapidly develop custom AI apps using only Python APIs.
Applied Gen AI for the Finance Vertical Sri Ambati
Megan Kurka, Vice President, Customer Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
Discover the transformative power of Applied Gen AI. Learn how the H2O team builds customized applications and workflows that integrate capabilities of Gen AI and AutoML specifically designed to address and enhance financial use cases. Explore real world examples, learn best practices, and witness firsthand how our innovative solutions are reshaping the landscape of finance technology.
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
Pascal Pfeiffer, Principal Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
En esta reunión virtual, damos una introducción a la plataforma de aprendizaje automático de código abierto número 1, H2O-3 y te mostramos cómo puedes usarla para desarrollar modelos para resolver diferentes casos de uso.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
In this session, you will learn about what you should do after you’ve taken an AI transformation baseline. Over the span of this session, we will discuss the next steps in moving toward AI readiness through alignment of talent and tools to drive successful adoption and continuous use within an organization.
To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course
To find the Youtube video about this presentation: https://youtu.be/K1Cl3x3rd8g
Speaker:
Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. H2O.ai
Machine Intelligence
Sparkling Water
• Seamless integration of H2O with Spark ecosystem
• Transparent use of H2O data structures and algorithms
with Spark API
• Excels in existing Spark workflows requiring advanced
Machine Learning algorithms
Provides the following:
5. H2O.ai
Machine Intelligence
Tech News Use Case— Crawler
Used import.io to create a crawler which went through numerous pages of techcrunch.com
news and and acquired data regarding the title of the article, the author, a 2-3 sentence
opening from the beginning of the articles, and the tags associated with the article
6. H2O.ai
Machine Intelligence
Tech News Use Case
First manipulation of words involves eliminating words that could
occur frequently and do not add value to the classification process.
Sample Scala code:
7. H2O.ai
Machine Intelligence
Tech News Use Case
We now eliminate words that do not add value to the
classification process
• ie punctuation, stopwords, and words that do not occur
frequently
Sample Scala code:
8. H2O.ai
Machine Intelligence
Tech News Use Case — Word2Vec
A mathematical way to represent a word as a vector of numbers.
These vector ‘representations’ encode information about the
given word. In other words, the vector captures the meaning of
the word.
10. H2O.ai
Machine Intelligence
Category Information
The original data set yielded about 55
categories. In order to streamline the
classification process, we chose the 14
most frequently appearing tags in our
dataset and labeled the rest into a
catch-all category titled “Other.” The
figure to the right shows the
distribution of data in each category.
Category Information
The variable importance chart to the right shows that
the author holds an overwhelming majority when it
comes to importance among variables. In other
words, the classification took place using very little
information from the text samples provided and
came mostly from authors that frequently write
under the same article tag. Let’s see how this changes
when we try to classify the articles using only the text
samples.
11. H2O.ai
Machine Intelligence
Analysis
The validation confusion matrix below is for the model that used both the authors and text blurbs to
categorize articles. We know that in this model, there was a heavy variable importance placed on
authors. In the confusion matrix below, we see how this effects the error rate of various tags. For tags
with smaller sets of data, it is common that a few authors write the majority of articles associated with
those tags. For the “Enterprise” tag for example, the data set is relatively small, and the error rate is
relatively low (40%).
12. H2O.ai
Machine Intelligence
Analysis
The validation confusion matrix below is for the model that uses text blurbs exclusively to categorize
articles. If we look at the error rate on the “Enterprise” tag, we see that the error rate is 75%,
significantly higher than the error rate we saw when authors were incorporated into the data. This
shows the strength in the variable importance of the authors.
14. H2O.ai
Machine Intelligence
Hit Ratios
With Authors Without Authors
Hit ratios illustrate the chances of your model correctly categorizing a text blurb on the 1st,
2nd, 3rd, etc. try. The above charts show that both the model that do and do not include
authors have approx. 70% chance of correctly predicting a text blurb on the second try.
15. H2O.ai
Machine Intelligence
Possible Use
A possible use for such classification capabilities would be for blog
posting sites. The user would enter their text into the field, and the
classification model would automatically choose tags for the post.
16. H2O.ai
Machine IntelligenceCustomers • Community •
November 9, 10, 11
Computer History Museum
H2OWORLD.H2O.AI
20% off
registration
using code:
h2ocommuni
ty