Getting deep learning adopted at your company. The current landscape of academia vs industry. Presentation at AI with the best (online conference):
http://ai.withthebest.com/
Anomaly Detection and Automatic Labeling with Deep LearningAdam Gibson
Adam Gibson demonstrates how to use variational autoencoders to automatically label time series location data. You'll explore the challenge of imbalanced classes and anomaly detection, learn how to leverage deep learning for automatically labeling (and the pitfalls of this), and discover how you can deploy these techniques in your organization.
Deploying signature verification with deep learningAdam Gibson
Presentation covered building a signature verification system and deploying it to production. This includes resources usage as well as how the model was picked.
Meetup held in Tokyo with Deep learning Otemachi.
Anomaly Detection and Automatic Labeling with Deep LearningAdam Gibson
Adam Gibson demonstrates how to use variational autoencoders to automatically label time series location data. You'll explore the challenge of imbalanced classes and anomaly detection, learn how to leverage deep learning for automatically labeling (and the pitfalls of this), and discover how you can deploy these techniques in your organization.
Deploying signature verification with deep learningAdam Gibson
Presentation covered building a signature verification system and deploying it to production. This includes resources usage as well as how the model was picked.
Meetup held in Tokyo with Deep learning Otemachi.
H2O World - H2O Deep Learning with Arno CandelSri Ambati
H2O World 2015
Tutorial scripts for R, Python are here:
https://github.com/h2oai/h2o-world-2015-training/tree/master/tutorials/deeplearning
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Formulatedby
Presented by Hila Lamm, Chief Strategy Officer at Firefly.ai
Next DSS MIA Event - https://datascience.salon/miami/
Next DSS AUS Event - https://datascience.salon/austin/
With all the hype around auto machine learning for computer vision, businesses with structured data are left wondering: Is AutoML relevant for enterprise data? Can it alleviate the bottleneck that data science teams are experiencing?
Our team was experimenting with different types of enterprise challenges -- from optimizing pricing to credit card fraud detection to retail banking customer behavior -- and was able to automatically build models that produced top-ranking Kaggle results within a few hours. In this session, through customer use cases and under the hood insights, you will learn about the capabilities of AutoML as applied on Firefly. Oh, and we’ll also talk about how we attained a Kaggle 1st place score in just half an hour.
Distributed machine learning 101 using apache spark from a browser devoxx.b...Andy Petrella
A 3 hours session introducing the concept of Machine Learning and Distributed Computing.
It includes many examples running in notebooks of experience run on data exploring models like LM, RF, K-Means, Deep Learning.
DeepLearning4J and Spark: Successes and Challenges - François GarillotSteve Moore
At the recent Spark & Machine Learning Meetup in Brussels, François Garillot of Skymind delivered this lightning talk to a sold-out crowd.
Specifically, François offered a tour of the DeepLearning4J architecture intermingled with applications. He went over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries — all with an easy-access interface.
Along the way, he pointed out the strategic points of parallelization of computation across machines and gave insight on where Spark helps — and where it doesn't.
As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future.
Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)
Most developers don’t write Object-Oriented code, including myself. And yet, we have learned how to do it. Why is that? One of the possible reasons, is that many frameworks (Java EE, Spring) do not favor OOP in their design. But what is OOP really? We will have a look at a super-simplified bank account model. In my demo, I’ll demo with a Java-based application how can we evolve from the traditional current approach to proper OOP. Finally, it will be time to have a look at the benefits and drawbacks of both approach.
Recent presentation on deeplearning4j's new features as well as some underused features of the AI framework like arbiter,datavec's transform process and libnd4j.
Strata San Jose 2016: Scalable Ensemble Learning with H2OSri Ambati
Erin LeDell's presentation on Scalable Ensemble Learning with H2O at Strata + Hadoop World San Jose, 03.29.16
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This talk was on deep learning use cases outside of computer vision. It also covered larger scale patterns of what good deep learning use cases typically look like. We end up on an explanation of anomaly detection and various kinds of anomaly use cases.
Distributed deep rl on spark strata singaporeAdam Gibson
This talk briefly covers deep reinforcemeent learning on spark and the benefits of using large scale commodity compute with gpus for ease of running simulations as well as distributed training for use cases that aren't games such as network intrusion and risk. This talk also briefly mentions rl4j and our work with openai gym.
H2O World - H2O Deep Learning with Arno CandelSri Ambati
H2O World 2015
Tutorial scripts for R, Python are here:
https://github.com/h2oai/h2o-world-2015-training/tree/master/tutorials/deeplearning
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Formulatedby
Presented by Hila Lamm, Chief Strategy Officer at Firefly.ai
Next DSS MIA Event - https://datascience.salon/miami/
Next DSS AUS Event - https://datascience.salon/austin/
With all the hype around auto machine learning for computer vision, businesses with structured data are left wondering: Is AutoML relevant for enterprise data? Can it alleviate the bottleneck that data science teams are experiencing?
Our team was experimenting with different types of enterprise challenges -- from optimizing pricing to credit card fraud detection to retail banking customer behavior -- and was able to automatically build models that produced top-ranking Kaggle results within a few hours. In this session, through customer use cases and under the hood insights, you will learn about the capabilities of AutoML as applied on Firefly. Oh, and we’ll also talk about how we attained a Kaggle 1st place score in just half an hour.
Distributed machine learning 101 using apache spark from a browser devoxx.b...Andy Petrella
A 3 hours session introducing the concept of Machine Learning and Distributed Computing.
It includes many examples running in notebooks of experience run on data exploring models like LM, RF, K-Means, Deep Learning.
DeepLearning4J and Spark: Successes and Challenges - François GarillotSteve Moore
At the recent Spark & Machine Learning Meetup in Brussels, François Garillot of Skymind delivered this lightning talk to a sold-out crowd.
Specifically, François offered a tour of the DeepLearning4J architecture intermingled with applications. He went over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries — all with an easy-access interface.
Along the way, he pointed out the strategic points of parallelization of computation across machines and gave insight on where Spark helps — and where it doesn't.
As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future.
Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)
Most developers don’t write Object-Oriented code, including myself. And yet, we have learned how to do it. Why is that? One of the possible reasons, is that many frameworks (Java EE, Spring) do not favor OOP in their design. But what is OOP really? We will have a look at a super-simplified bank account model. In my demo, I’ll demo with a Java-based application how can we evolve from the traditional current approach to proper OOP. Finally, it will be time to have a look at the benefits and drawbacks of both approach.
Recent presentation on deeplearning4j's new features as well as some underused features of the AI framework like arbiter,datavec's transform process and libnd4j.
Strata San Jose 2016: Scalable Ensemble Learning with H2OSri Ambati
Erin LeDell's presentation on Scalable Ensemble Learning with H2O at Strata + Hadoop World San Jose, 03.29.16
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This talk was on deep learning use cases outside of computer vision. It also covered larger scale patterns of what good deep learning use cases typically look like. We end up on an explanation of anomaly detection and various kinds of anomaly use cases.
Distributed deep rl on spark strata singaporeAdam Gibson
This talk briefly covers deep reinforcemeent learning on spark and the benefits of using large scale commodity compute with gpus for ease of running simulations as well as distributed training for use cases that aren't games such as network intrusion and risk. This talk also briefly mentions rl4j and our work with openai gym.
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
In the last couple of years, deep learning techniques have transformed the world of artificial intelligence. One by one, the abilities and techniques that humans once imagined were uniquely our own have begun to fall to the onslaught of ever more powerful machines. Deep neural networks are now better than humans at tasks such as face recognition and object recognition. They’ve mastered the ancient game of Go and thrashed the best human players. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new hype? How is Deep Learning different from previous approaches? Let’s look behind the curtain and unravel the reality. This talk will introduce the core concept of deep learning, explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why “deep learning is probably one of the most exciting things that is happening in the computer industry“ (Jen-Hsun Huang – CEO NVIDIA).
Cities and Startups: Cultivating Deep EngagementCode for America
Cities and Startups: Cultivating Deep Engagement
FastFWD, City of Philadelphia
Story Bellows, co-director of the Philadelphia Mayor's Office of New Urban Mechanics
Watch the video online: https://www.youtube.com/watch?v=PRKUCCHj-08&list=PL65XgbSILalVoej11T95Tc7D7-F1PdwHq&index=4
Get involved with Code for America: www.codeforamerica.org/action
In this presentation, we’ll go over real-world use cases of Machine Learning and Artificial Intelligence in web and mobile applications, and we’ll explain how they work. We’ll discuss opportunities for startups in all domains to create value from data (big or small) and to create innovative, predictive features in their applications.
We’ll review existing technologies that make Machine Learning accessible, in particular with automatic selection of algorithms, auto-tuning of parameters, and auto-scaling. Deep Learning (a subset of Machine Learning techniques which is getting a lot of press due to recent advances and successes) is also being made accessible without costly hardware and, in certain cases, without requiring specialized knowledge.
The main message for developers is that they can easily use the power of machine intelligence without having to rely on a team of Data Scientists. This will be illustrated in more detail with concrete use cases: priority detection and image categorization.
Ultra brief and ultra draft overview of investor's look at machine learning / deep learning startups by Victor Osyka of Almaz Capital, https://www.linkedin.com/in/victorosyka or http://fb.com/victor.osika
SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...Rod King, Ph.D.
Fast Validated Learning is at the core of the Lean Startup Method. However, learning and mastering the Lean Startup Method is a time-consuming, arduous, and expensive venture. The main reason is that Lean Startup tools are developed, learned, and applied using a Fragmented Learning approach. There is an exponential increase in the number of Lean Startup tools. However, Lean Startup tools hardly talk to each other; they do not share a register or common vocabulary of topics,
Question-tags are very powerful tools for organizing and managing ideas as well as tools in any methodology including the Lean Startup Method. In this presentation, six question-tags and basic templates are presented. These question-tags and templates can be used as the basic building blocks or "atoms" for creating tools ("molecules" and "compounds") for Universal Problem Solving & Project Management (UPSPM). In other words, the presented blank and annotated Question (Q)-Templates can be used for discovering, solving, and managing problems in every domain.
For Lean Startups, these Q-Templates are the basic tools for effectively as well as efficiently organizing and managing Lean Startup projects. These Q-Templates can be put together to function like any Lean Startup tool' for instance, Validation Board, Value Proposition Canvas, Business Model Canvas, and Lean Canvas. Also, all business tools can be deconstructed or decomposed using the Q-Templates.
Investors foresee a safe bet on deep tech startupseTailing India
Indian deep technology start-ups have become the most sought after bets for angels and venture capital (VC) funds for their potential to scale up rapidly and be able to offer an opportunity for early exit for the investors.
Self-Service.AI - Pitch Competition for AI-Driven SaaS StartupsDatentreiber
SELF-SERVICE.AI IN A NUTSHELL
Background:> artificial intelligence enables SaaS companies to build intelligent self-service solutions for complex tasks such as customer service, personal scheduling, dynamic pricing, ad targeting etc.
Objective:> provide a networking platform for AI-driven SaaS start-ups to present their product and team to high profile clients, partners and investors by organizing a start-up pitch competition.
Audience:> start-ups from any country worldwide at any given stage with a Software-as-a-Service product that uses artificial intelligence (i.e. machine & deep learning, predictive & prescriptive analytics etc.) to provide a self-service solution for companies or consumers that solve a concrete business problem or serve a certain need.
Examples:> existing AI-driven SaaS startups are e.g.: Clarifai, x.ai, Api.ai, Versium, Gpredictive, collectAI, trbo, DigitalGenius, DataMinr and many more to come.
Recommender Systems and Active Learning (for Startups)Neil Rubens
This presentation presents a high level overview of recommender systems and active learning, including from the viewpoint of startups vs. established companies, the cold-start problem, etc.
Investor's view on machine intelligence startups, 2.0, Jan 2017Victor Osyka
Updated deeper overview of investor's look at machine learning / deep learning startups, with slight Russian accent. =)
Some slides are courtesy of Russia.ai and personally great friend @Petr Zhegin:
#23, #28 are from http://www.russia.ai/single-post/2016/09/21/Ten-Russian-speaking-venture-capital-funds-one-may-consider-to-back-an-AI-startup
#30 insights are from http://www.slideshare.net/RussiaAI/artificial-intelligence-investment-trends-and-applications-h1-2016
Victor Osyka of Almaz Capital, http://fb.com/victor.osika, http://medium.com/@victorosyka
BootstrapLabs - Tracxn Report - artificial intelligence for the Applied Arti...BootstrapLabs
This report covers companies that provide the infrastructure for creating Artificial Intelligence. These Infrastructure companies include those working on Machine Learning, Deep Learning based platforms, libraries. Some of theses companies also provide platforms for Natural Language Processing and Visual Recognition. In the Applications section, the report covers companies leveraging AI techniques to build applications tailored for end use in Enterprise, Industry & Consumer sectors.
Over $1B has been invested in AI-Infrastructure startups since 2010 with ¬$340M being invested in 2015. Over $7.5B has been invested in AI-Applications startups since 2010 with $2.3B being invested in 2015.
This is the slide that Terry. T. Um gave a presentation at Kookmin University in 22 June, 2014. Feel free to share it and please let me know if there is some misconception or something.
(http://t-robotics.blogspot.com)
(http://terryum.io)
Machine learning has become an important tool in the modern software toolbox, and high-performing organizations are increasingly coming to rely on data science and machine learning as a core part of their business. eBay introduced machine learning to its commerce search ranking and drove double-digit increases in revenue. Stitch Fix built a multibillion dollar clothing retail business in the US by combining the best of machines with the best of humans. And WeWork is bringing machine-learned approaches to the physical office environment all around the world. In all cases, algorithmic techniques started simple and slowly became more sophisticated over time. This talk will use these examples to derive an agile approach to machine learning, and will explore that approach across several different dimensions. We will set the stage by outlining the kinds of problems that are most amenable to machine-learned approaches as well as describing some important prerequisites, including investments in data quality, a robust data pipeline, and experimental discipline. Next, we will choose the right (algorithmic) tool for the right job, and suggest how to incrementally evolve the algorithmic approaches we bring to bear. Most fancy cutting-edge recommender systems in the real world, for example, started out with simple rules-based techniques or basic regression. Finally, we will integrate machine learning into the broader product development process, and see how it can help us to accelerate business results
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify Dataconomy Media
Abstract of the Presentation:
This talk is for the underdog. If you’re trying to solve data related problems with no or limited resources, be them time, money or skills don’t go no further. This talk points mostly to decades old technology, free operating systems and cheap hardware if possible, but if it makes sense to spend a hundred bucks instead of tearing your hair, we’ll say so. This talk is opinionated and updated to GDPR, deep learning and all the hype.
About the Author:
Daniel Molnar is a data nerd and startup specialist. With over 19 years of experience in startups and nine years of expertise in data related topics, he is an experienced co-founder who has built and hired teams up of to 30 people. He comes with expertise in proven build-to-market capabilities and utilizing data for successful products. An amalgamation of his skills would be CS + data + product background under one hat.
Speaker: Venkatesh Umaashankar
LinkedIn: https://www.linkedin.com/in/venkateshumaashankar/
What will be discussed?
What is Data Science?
Types of data scientists
What makes a Data Science Team? Who are its members?
Why does a DS team need Full Stack Developer?
Who should lead the DS Team
Building a Data Science team in a Startup Vs Enterprise
Case studies on:
Evolution Of Airbnb’s DS Team
How Facebook on-boards DS team and trains them
Apple’s Acqui-hiring Strategy to build DS team
Spotify -‘Center of Excellence’ Model
Who should attend?
Managers
Technical Leaders who want to get started with Data Science
Thinking About Prototyping: Sketching, Familiarity, Costs versus Ease of Prototyping, Prototypes and Production, Changing Embedded Platform, Physical Prototypes and Mass Personalisation, Climbing into the Cloud, Open Source versus Closed Source, Why Closed? Why Open? Mixing Open and Closed Source, Closed Source for Mass Market Projects, Tapping into the Community. Prototyping Embedded Devices: Electronics, Sensors, Actuators, Scaling Up the Electronics, Embedded Computing Basics, Microcontrollers, System-on-Chips, Choosing Your Platform, Arduino, Developing on the Arduino, Some Notes on the Hardware, Openness, Raspberry Pi, Cases and Extension Boards, Developing on the Raspberry Pi, Some Notes on the Hardware, Openness.
When going into the development of a software product, a possible source of mistake is the incorrect evaluation of the complexity that lies behind an idea , as well as a clutter coming from the massive amounts of technologies enabled. This presentation explains a possible way to deal with such issues.
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
First public meetup at Twitter Seattle, for Seattle DAML:
http://www.meetup.com/Seattle-DAML/events/159043422/
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
This is an 1 hour presentation on Neural Networks, Deep Learning, Computer Vision, Recurrent Neural Network and Reinforcement Learning. The talks later have links on how to run Neural Networks on
Deep Learning on Apache® Spark™: Workflows and Best PracticesDatabricks
The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. This webinar, based on the experience gained in assisting customers with the Databricks Virtual Analytics Platform, will present some best practices for building deep learning pipelines with Spark.
Rather than comparing deep learning systems or specific optimizations, this webinar will focus on issues that are common to deep learning frameworks when running on a Spark cluster, including:
* optimizing cluster setup;
* configuring the cluster;
* ingesting data; and
* monitoring long-running jobs.
We will demonstrate the techniques we cover using Google’s popular TensorFlow library. More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters.
Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput, and monitoring facilitates both the work of configuration and the stability of deep learning jobs.
Deep Learning on Apache® Spark™: Workflows and Best PracticesJen Aman
The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. This webinar, based on the experience gained in assisting customers with the Databricks Virtual Analytics Platform, will present some best practices for building deep learning pipelines with Spark.
Rather than comparing deep learning systems or specific optimizations, this webinar will focus on issues that are common to deep learning frameworks when running on a Spark cluster, including:
* optimizing cluster setup;
* configuring the cluster;
* ingesting data; and
* monitoring long-running jobs.
We will demonstrate the techniques we cover using Google’s popular TensorFlow library. More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters.
Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput, and monitoring facilitates both the work of configuration and the stability of deep learning jobs.
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. This webinar, based on the experience gained in assisting customers with the Databricks Virtual Analytics Platform, will present some best practices for building deep learning pipelines with Spark.
Rather than comparing deep learning systems or specific optimizations, this webinar will focus on issues that are common to deep learning frameworks when running on a Spark cluster, including:
* optimizing cluster setup;
* configuring the cluster;
* ingesting data; and
* monitoring long-running jobs.
We will demonstrate the techniques we cover using Google’s popular TensorFlow library. More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters.
Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput, and monitoring facilitates both the work of configuration and the stability of deep learning jobs.
In this talk we cover
1. Why NLP and DL
2. Practical Challenges
3. Some Popular Deep Learning models for NLP
Today you can take any webpage in any language and translate it automatically into language you know! You can also cut paste an article or other document into NLP systems and immediately get list of companies and people it talks about, topics that are relevant and the sentiment of the document. When you talk to Google or Amazon assistant, you are using NLP systems. NLP is not perfect but given the advances in last two years and continuing, it is a growing field. Let’s see how it actually works, specifically using Deep learning
About Shishir
Shishir is a Senior Data Scientist at Thomson Reuters working on Deep Learning and NLP to solve real customer pain, even ones they have become used to.
Similar to Deep learning in production with the best (20)
Self driving computers active learning workflows with human interpretable ve...Adam Gibson
Human in the loop learning workflows leveraging deep learning to group and cluster data. Also, techniques for accounting for machine learning failures.
Strata Beijing - Deep Learning in Production on SparkAdam Gibson
Recent talk at strata beijing - half english half chinese covering use cases of deep learning, deep learning in production and the different components of deeplearning4j.
Gave a talk at:
www.meetup.com/SF-Bayarea-Machine-Learning/events/221739934/
Covers basic architecture of a scientific lib and my take on it with nd4j.
These slides accompanied a demo of Deeplearning4j at the SF Data Mining Meetup hosted by Trulia.
http://www.meetup.com/Data-Mining/events/212445872/
Deep-learning is useful in detecting identifying similarities to augment search and text analytics; predicting customer lifetime value and churn; and recognizing faces and voices.
Deeplearning4j is an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. It includes a distributed deep-learning framework and a normal deep-learning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure. The distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models.
The framework's neural nets include restricted Boltzmann machines, deep-belief networks, deep autoencoders, convolutional nets and recursive neural tensor networks.
Finally, Deeplearning4j integrates with GPUs. A stable version was released in October.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
1. skymind.io | deeplearning.org | gitter.im/deeplearning4j
Deep Learning in Production
Building Production Class Deep Learning Workflows for the Enterprise
Adam Gibson / CTO Skymind
AI With the Best / The Internet
2. Topics
• Deep Learning in Production vs Academia
• Data Scientists vs Engineers
• Defining Production
• A solution
4. Academia/Research
Focus on accuracy and the latest architectures
Build proof of concepts quickly to validate an assumption
Prototype as many ideas as quickly as possible to come
up with a solution to a problem
Publish often incremental results to increase publications
5. Current state of research
Mostly funded by large consumer companies (Amazon,Google,Facebook,..)
Scant pockets of deep learning academic institutions (CMU,Stanford,NYU,..)
Large focus on audio and vision, somewhat spreading in to natural language
processing
Starting to focus more on reinforcement learning and better ways of tuning
6. People in Deep Learning
• Talent still sparse
• Most are in research labs
• Some of them are enthusiasts or startup founders
• Reality: Deep Learning hasn’t hit most of the world yet. It affects alot of people
but most aren’t doing it.
7. Industry (MOST Companies doing data science)
● Most use linear regression and random forest
● Prototyping happens in python - these are data scientists
● Data Engineers hold the keys to the cluster (write code in java)
● Most problems are simple - analytics, churn prediction, maybe
recommendation engines or price forecasting
● Deep Learning is seen as overkill - no gpus in your cluster
9. Data Scientists
• Math or stats background - know r or python
• Often a beginning coder - may have started in sql and
moved up to analytics
• Know basic machine learning - problems are focused
on replacing excel spreadsheets or solving business
problems
10. Data Engineers
• Computer Science background
• Builds data pipelines and knows how to setup
production systems
• Doesn’t really know machine learning that well -
usually willing to learn
• Usually closer to the product team - may port python
algorithms to java depending on level of ability
11. The hybrid
• Been in the game a while knows CS and stats
• Knows SQL, machine learning, and how to operate a
spark cluster
• Can formulate problems and figure out what projects to
tackle next
• Either understands business objectives or can
implement machine learning algorithms themselves
12. Most companies
• 2 separate teams
• Data scientists use python/r and sql, experiment with
data and come up with new models (very little machine
learning)
• Data engineers use java (sometimes .net) and work on
terabytes of data - most time spent writing integrations
and data pipelines
13. Startups
● Tend to employ generalists
● Usually 3-5 people who can sort of do both. Startups aren’t usually ready to
hire specialists
● Sometimes have a product where something like deep learning is needed
● Usually ruby or python stack, not many users or scale
● Usually just want something simple to setup
● Not much need for compiled languages or scale yet - this comes later
15. Defining “Production”
● Varying degrees of scale
● Not everyone has terabytes of data
● Mysql and outsourced cloud services are “machine learning” for most startups
● Many will start out with scikit learn and flask, maybe add python based deep
learning later. This is “good enough” - this is also what you see the most
tutorials for
● Larger companies care more about other things - security,scale, and return on
investment for projects. These companies use java
● If you’re google you use c++ or facebook you use your own version of php
you wrote and maintain
16. Hardware
• GPUs have very little market penetration
• Deep Learning also has very little market penetration
(despite the marketing)
• Most of the world is cpus (this is changing very slowly)
• Startups are fine with cloud - on prem data centers are
usually dell or hp servers with red hat or ubuntu on
them
17. Typical stack
• Web based product (go,ruby,python,scala,java,mix)
• Storage (1 or more sql databases, elasticsearch/solr)
• Cloud infrastructure or on prem (bare metal)
• Machine Learning - ???
18. Machine Learning at startups
• Random 1 off scripts for analysis
• Random 1 off notebooks
• 1 off ETL pipelines written in java
• 1 or more models tied to a rest api that talks to your
product stack
19. Machine Learning at big companies
• Random 1 off scripts for analysis
• Random 1 off notebooks
• Large numbers of separate data bases and applications
run by different teams
• Multiple disconnected apis
• Some models connected to a spark or hadoop cluster
20. Challenges in Production
• Serving user traffic (latency)
• Data access (connecting everything together)
• Large amounts of time spent on data pipeline code
• Unclear metrics of success for the data team
• Lack of innovation or “too much” eg: “chase the shiny
new thing”
21. Challenges of Deep Learning in Production
• Same problems as machine learning
• Hard to interpret models
• Requires specialized hardware
• Not a lot of best practices
• Lack of expertise (machine learning is hard enough)
23. Establish some best practices
• Kaggle is a good start for this - start with “somewhat real” problems
• Use higher level tools - keras, otherwise easy to get lost in weeds
• Consider having a real world goal - eg: if you’re in real estate figure out how to
use a simple cnn (not the latest algorithm) for image search
• Depending on need consider integration with hadoop/spark
• Lastly - don’t treat deep learning as special. It’s still a subfield of machine
learning
24. Going to production
• Sometimes python is enough for simple stuff
• Data Engineering teams should consider java/scala
based solutions (disclaimer: highly opinionated here)
• Follow same workflow - prototype in python port to
production
• Overall - scope to a core problem where deep learning
is worth it
25. Newer hardware
• Prototype on cloud infrastructure on a toy problem
• Try out this “GPU thing” and see what might be
involved
• Learn the trade offs of cpus and gpus - don’t believe
the marketing
• Buy new hardware as needed
26. In closing
• Use something open source to start off with
• Use something *supported* keep an eye on open
source activity
• Don’t just believe the research. Papers are not your
company. Do due diligence