Our Summer 2017 release presents Deepnets, a highly effective supervised learning method that solves classification and regression problems in a way that can match or exceed human performance, especially in domains where effective feature engineering is difficult. BigML Deepnets bring two unique parameter optimization options: Automatic Network Search and Structure Suggestion. These options avoid the difficult and time-consuming work of hand-tuning the algorithm and ensure the best network among all possible networks to solve your problem. This new resource is available from the BigML Dashboard, API, as well as from WhizzML for its automation. Deepnets are state-of-the-art in many important supervised learning applications.
VSSML17 L5. Basic Data Transformations and Feature EngineeringBigML, Inc
Valencian Summer School in Machine Learning 2017 - Day 2
Lecture 5: Basic Data Transformations and Feature Engineering. By Poul Petersen (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBigML, Inc
Brazilian Summer School in Machine Learning 2016
Day 2 - Lecture 3: REST API, Bindings, and Basic Workflows
Lecturer: Dr. José Antonio Ortega - jao (BigML)
Valencian Summer School in Machine Learning 2017 - Day 2
Lecture Review: Summary Day 2 Sessions. By Mercè Martín Prats (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
Brazilian Summer School in Machine Learning 2016
Day 2 - Lecture 4: Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking
Lecturer: Dr. José Antonio Ortega - jao (BigML)
Valencian Summer School in Machine Learning 2017 - Day 2
Lecture 6: Time Series and Deepnets. By Charles Parker (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
One of the most important, yet often overlooked, aspects of predictive modeling is the transformation of data to create model inputs, better known as feature engineering (FE). This talk will go into the theoretical background behind FE, showing how it leverages existing data to produce better modeling results. It will then detail some important FE techniques that should be in every data scientist’s tool kit.
BigML brings Principal Component Analysis (PCA) to the platform, a key unsupervised Machine Learning technique used to transform a given dataset in order to yield uncorrelated features and reduce dimensionality. BigML PCA unique implementation is distinct from other approaches to PCA in that it can handle numeric and non-numeric data types, including text, categorical, items fields, as well as combinations of different data types. PCA can be used in any industry vertical as a preprocessing technique to improve supervised learning performance, with the caveat that some measure of interpretability may be sacrificed. It is commonly applied in fields with high dimensional data including bioinformatics, quantitative finance, and signal processing.
VSSML17 L2. Ensembles and Logistic RegressionsBigML, Inc
Valencian Summer School in Machine Learning 2017 - Day 1
Lecture 2: Ensembles and Logistic Regressions. By Poul Petersen (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformBigML, Inc
Introduction to Machine Learning with the BigML Platform - ML for Executives Course.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Enhancing and Automating Decision Making with Machine Learning. Feature Engineering: Creating Features that Make Machine Learning Work, by BigML.
MLSEV 2019: 1st edition of the Machine Learning School in Seville, Spain.
Building Serverless Applications Using AWS AppSync and Amazon Neptune (SRV307...Amazon Web Services
In this session, learn how to build a data driven, serverless calorie tracker application with real-time, offline, and data syncing capabilities. The application provides an overview of your progress toward the calorie intake goal you've set, recommended intake remains, and breakdown of calories consumed. Use Amazon Cognito to build signup and sign-in capabilities as well as federated login to Facebook. The application integrates with AWS AppSync to provide real-time data from multiple data sources through GraphQL technology as well as offline capability. AWS AppSync makes it easy to access this data and provide the exact information your application needs. As a bonus, learn to use Amazon Neptune, a fully managed graph database, to build a personalized recommendation engine for calorie intake.
Automating your own Machine Learning Projects - Workshop: Working with the Masters.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Our Summer 2017 release presents Deepnets, a highly effective supervised learning method that solves classification and regression problems in a way that can match or exceed human performance, especially in domains where effective feature engineering is difficult. BigML Deepnets bring two unique parameter optimization options: Automatic Network Search and Structure Suggestion. These options avoid the difficult and time-consuming work of hand-tuning the algorithm and ensure the best network among all possible networks to solve your problem. This new resource is available from the BigML Dashboard, API, as well as from WhizzML for its automation. Deepnets are state-of-the-art in many important supervised learning applications.
VSSML17 L5. Basic Data Transformations and Feature EngineeringBigML, Inc
Valencian Summer School in Machine Learning 2017 - Day 2
Lecture 5: Basic Data Transformations and Feature Engineering. By Poul Petersen (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBigML, Inc
Brazilian Summer School in Machine Learning 2016
Day 2 - Lecture 3: REST API, Bindings, and Basic Workflows
Lecturer: Dr. José Antonio Ortega - jao (BigML)
Valencian Summer School in Machine Learning 2017 - Day 2
Lecture Review: Summary Day 2 Sessions. By Mercè Martín Prats (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
Brazilian Summer School in Machine Learning 2016
Day 2 - Lecture 4: Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking
Lecturer: Dr. José Antonio Ortega - jao (BigML)
Valencian Summer School in Machine Learning 2017 - Day 2
Lecture 6: Time Series and Deepnets. By Charles Parker (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
One of the most important, yet often overlooked, aspects of predictive modeling is the transformation of data to create model inputs, better known as feature engineering (FE). This talk will go into the theoretical background behind FE, showing how it leverages existing data to produce better modeling results. It will then detail some important FE techniques that should be in every data scientist’s tool kit.
BigML brings Principal Component Analysis (PCA) to the platform, a key unsupervised Machine Learning technique used to transform a given dataset in order to yield uncorrelated features and reduce dimensionality. BigML PCA unique implementation is distinct from other approaches to PCA in that it can handle numeric and non-numeric data types, including text, categorical, items fields, as well as combinations of different data types. PCA can be used in any industry vertical as a preprocessing technique to improve supervised learning performance, with the caveat that some measure of interpretability may be sacrificed. It is commonly applied in fields with high dimensional data including bioinformatics, quantitative finance, and signal processing.
VSSML17 L2. Ensembles and Logistic RegressionsBigML, Inc
Valencian Summer School in Machine Learning 2017 - Day 1
Lecture 2: Ensembles and Logistic Regressions. By Poul Petersen (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformBigML, Inc
Introduction to Machine Learning with the BigML Platform - ML for Executives Course.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Enhancing and Automating Decision Making with Machine Learning. Feature Engineering: Creating Features that Make Machine Learning Work, by BigML.
MLSEV 2019: 1st edition of the Machine Learning School in Seville, Spain.
Building Serverless Applications Using AWS AppSync and Amazon Neptune (SRV307...Amazon Web Services
In this session, learn how to build a data driven, serverless calorie tracker application with real-time, offline, and data syncing capabilities. The application provides an overview of your progress toward the calorie intake goal you've set, recommended intake remains, and breakdown of calories consumed. Use Amazon Cognito to build signup and sign-in capabilities as well as federated login to Facebook. The application integrates with AWS AppSync to provide real-time data from multiple data sources through GraphQL technology as well as offline capability. AWS AppSync makes it easy to access this data and provide the exact information your application needs. As a bonus, learn to use Amazon Neptune, a fully managed graph database, to build a personalized recommendation engine for calorie intake.
Automating your own Machine Learning Projects - Workshop: Working with the Masters.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
VSSML16 L7. REST API, Bindings, and Basic WorkflowsBigML, Inc
VSSML16 L7. REST API, Bindings, and Basic Workflows
Valencian Summer School in Machine Learning 2016
Day 2 VSSML16
Lecture 7
REST API, Bindings, and Basic Workflows
jao -- Jose A. Ortega (BigML)
https://bigml.com/events/valencian-summer-school-in-machine-learning-2016
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...Amazon Web Services
GRAIL is a life sciences company that analyzes large data sets from high-throughput DNA sequencers to develop methods for early cancer detection. In this session, hear how GRAIL's open-source, cloud-based batch processing system, Reflow, leverages Amazon EC2, Amazon S3, and Amazon DynamoDB to support the large-scale, high-throughput, and cost-efficient data analysis that enables GRAIL's research and development efforts. Reflow takes a modern, “cloud-native” approach to batch data processing, and is architected to run directly on the facilities offered by cloud providers like AWS. This approach allows Reflow to maintain a simple design and implementation while maximally utilizing the underlying AWS services and minimizing operational overhead and computing costs.
Your Roadmap for An Enterprise Graph StrategyNeo4j
Speaker: Michael Moore, Ph.D., Executive Director, Knowledge Graphs + AI, EY National Advisory
Abstract: Knowledge graphs have enormous potential for delivering superior customer experiences, advanced analytics and efficient data management.
Learn valuable tips from a leading practitioner on how to position, organize and implement your first enterprise graph project.
Video | https://youtu.be/V5ukRSqcmYY
Event | https://www.alluxio.io/data-orchestration-summit-2020/
Talk Link | https://www.alluxio.io/resources/videos/unified-data-access-with-gimel/
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Unified Data Access with Gimel
Deepak Chandramouli, Engineering Lead
Anisha Nainani, Sr. Software Engineer
Dr. Vladimir Bacvanski, Principal Architect (Paypal)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
The main objective of this workshop is to give the audience hands on experience with several Hadoop technologies and jump start their hadoop journey. In this workshop, you will load data and submit queries using Hadoop! Before jumping in to the technology, the Founders of DataKitchen review Hadoop and some of its technologies (MapReduce, Hive, Pig, Impala and Spark), look at performance, and present a rubric for choosing which technology to use when.
NOTE: To complete hands on poriton in the time allotted, attendees should come with a newly created AWS (Amazon Web Services) Account and complete the other prerequisites found in the DataKitchen blog <http: />.
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology and Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. The combination of the two can provide a solution to power advanced analytics for not only what has happened in the past, but make intelligent predictions about the future. Please join this webinar to learn how get the most value from your data for your data driven business.
Learning Objectives:
How to scale your Redshift queries with user-defined functions (UDFs)
How to apply Machine learning to historical data in Amazon Redshift
How to visualize your data with Amazon QuickSight
Present a reference architecture for advanced analytics
Who Should Attend:
Application developers looking to add UDFs, or predictive analytics to their applications, database administrators that need to meet the demand of data driven organizations, decision makers looking to derive more insight from their data
Take Mobile and Web Apps to the Next Level with AWS AppSync and AWS Amplify Amazon Web Services
Developing web and mobile applications that capture customer attention in today's marketplace is extremely competitive. Applications must be responsive, in real time, secure, and usable when no network is available. Customers expect snappy behavior along with robust features like search and discovery of information. Your development teams need to be able to prototype with these features and iterate even faster. In this session, learn how the latest features of AWS AppSync and AWS Amplify enable you to use GraphQL for real-time data streams to client applications, along with advanced caching functionality and offline operations. See the latest techniques for automatic mobile backend provisioning along with Approved capabilities for advanced querying and connecting to different data sources.
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
Apache Spark has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do you deploy these ML model to a production environment? How do you embed what you’ve learned into customer facing data applications?
In this talk I will discuss best practices on how data scientists productionize machine learning models, do a deep dive with actual case studies, and show live tutorials of a few example architectures and code in Python, Scala, Java and SQL.
Past, present and future of predictive APIs - Poul PetersenPAPIs.io
In the past year, Machine Learning has been getting attention as a necessary tool for doing something useful with the ever growing volume of data. This misleads some to believe that Machine Learning is new, but the truth is that the core algorithms and concepts have been around for a long time. What is new though is the confluence of Machine Learning and Cloud Computing which for the first time in history is making learning from large data possible thru the use of programmable APIs.
Since 2011, BigML has worked to implement this vision of a programmable web powered by a seamless machine learning layer in the cloud which will enable future smart apps to adapt themselves to a changing context in real-time as new information arrives. In this presentation we will trace the history of Machine Learning from it’s origins to the present and discuss the future evolution that must occur in terms of simplicity, programmability, importability / exportability, compostability, specialization and standardization in order for it to make an impact in the “real world” and make this vision come alive.
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
Keyanoush Razavidinani, Digital Services Consultant at A1 Digital, a BigML Partner, highlights why it is important to identify and reduce human bottlenecks that optimize processes and let you focus on important activities. Additionally, Guillem Vidal, Machine Learning Engineer at BigML completes the session by showcasing how Machine Learning is put to use in the manufacturing industry with a use case to detect factory failures.
The Road to Production: Automating your Anomaly Detectors - by jao (Jose A. Ortega), Co-Founder and Chief Technology Officer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
Machine Learning for Anti Money Laundering Compliance, by Kevin Nagel, Consultant and Data Scientist at INFORM.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
Multi Perspective Anomalies, by Jan W Veldsink, Master in the art of AI at Nyenrode, Rabobank, and Grio.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
My First Anomaly Detector: Practical Workshop, by Mercè Martín, VP of Bindings and Applications at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
History and Present Developments in Machine Learning, by Tom Dietterich, Emeritus Professor of computer science at Oregon State University and Chief Scientist at BigML.
*Machine Learning School in The Netherlands 2022.
Introduction to End-to-End Machine Learning: Classification and Regression - Mercè Martín, VP of Bindings and Applications at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
A Data-Driven Company: 21 Lessons for Large Organizations to Create Value from AI, by Richard Benjamins, Chief AI and Data Strategist at Telefónica.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
How Machine Learning Transforms and Automates Legal Services, by Arnoud Engelfriet, Co-Founder at Lynn Legal.
*Machine Learning School in The Netherlands 2022.
Machine Learning for Public Safety: Reducing Violence and Discrimination in Stadiums.
Speakers: Ramon van Ingen, Co-Founder at Siip, Entrepreneur, Researcher, and Pablo González, Machine Learning Engineer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
Process Optimization in Manufacturing Plants, by Keyanoush Razavidinani, Digital Business Consultant at A1 Digital.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
Lessons Learned Applying Anomaly Detection at Scale, by Álvaro Clemente, Machine Learning Engineer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
Citizen Development in AI, by Jan W Veldsink, Master in the art of AI at Nyenrode, Rabobank, and Grio.
*Machine Learning School in The Netherlands 2022.
This new feature is a continuation of and improvement on our previous Image Processing release. Now, Object Detection lets you go a step further with your image data and allows you to locate objects and annotate regions in your images. Once your image regions are defined, you can train and evaluate Object Detection models, make predictions with them, and automate end-to-end Machine Learning workflows on a single platform. To make that possible, BigML enables Object Detection by introducing the regions optype.
As with any other BigML feature, Object Detection is available from the BigML Dashboard, API, and WhizzML for automation. Object Detection is extremely helpful to tackle a wide range of computer vision use cases such as medical image analysis, quality control in manufacturing, license plate recognition in transportation, people detection in security surveillance, among many others.
This new release brings Image Processing to the BigML platform, a feature that enhances our offering to solve image data-driven business problems with remarkable ease of use. Because BigML treats images as any other data type, this unique implementation allows you to easily use image data alongside text, categorical, numeric, date-time, and items data types as input to create any Machine Learning model available in our platform, both supervised and unsupervised.
Now, it is easier than ever to solve a wide variety of computer vision and image classification use cases in a single platform: label your image data, train and evaluate your models, make predictions, and automate your end-to-end Machine Learning workflows. As with any other BigML feature, Image Processing is available from the BigML Dashboard, API, and WhizzML, and it can be applied to solve use cases such as medical image analysis, visual product search, security surveillance, and vehicle damage detection, among others.
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
This session presents a quite common situation for those working in food and beverage retail (FnB) and highlights interesting insights to fight waste reduction.
Speaker: Stephen Kinns, CEO and Co-Founder at catsAi.
*ML in Retail 2021: Webinar.
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
This is an introductory session about the role that Machine Learning is playing in the retail sector and how it is being deployed across the different areas of this industry.
Speaker: Atakan Cetinsoy, VP of Predictive Applications at BigML.
*ML in Retail 2021: Webinar.
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
This presentation analyzes the role that Machine Learning plays in legal automation with a real-world Machine Learning application.
Speaker: Arnoud Engelfriet, Co-Founder at Lynn Legal.
*ML in GRC 2021: Virtual Conference.
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
This is a real-life Machine Learning use case about integrated risk.
Speakers: Thomas Rengersen, Product Owner of the Governance Risk and Compliance Tool for Rabobank, and Thomas Alderse Baas, Co-Founder and Director of The Bowmen Group.
*ML in GRC 2021: Virtual Conference.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
3. BigML, Inc 3API / WhizzML
BigML Platform
Web-based Frontend
Visualizations
Distributed Machine Learning Backend
SOURCE
SERVER
DATASET
SERVER
MODEL
SERVER
PREDICTION
SERVER
EVALUATION
SERVER
SAMPLE
SERVER
WHIZZML
SERVER
Tools - https://bigml.com/tools
REST API - https://bigml.com/api
Smart Infrastructure
(auto-deployable, auto-scalable)
SERVERS
EVENTS GEARMAN
QUEUE
DESIRED
TOPOLOGY
AWS
COSTS
RUNQUEUE
SCALER
BUSY
SCALER
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
ACTUAL
TOPOLOGY
MESSAGE
QUEUE
4. BigML, Inc 4API / WhizzML
The Need for a ML API
• Workflow Automation - reduce drudgery
• Abstraction - reuse code
• Composability - powerful combinations of APIs
• Integration - Dashboard or UI component
• Automate deployment
• Repeatable results
5. BigML, Inc 5API / WhizzML
Predictive Applications
Collect
& Format
Data
Define
ML
Problem
ETL
Model &
Evaluate
no
yes
Explore
Collect
& Format
Data
Model
Automate
Consume
& Monitor
Predict
Score
Label
Drift &
Anomaly
feature
engineer
Not
Possible
tune
algorithm
Goal
Met?
6. BigML, Inc 6API / WhizzML
BigML API Endpoint
https://bigml.io/ / /{id}?{auth}
source
dataset
model
ensemble
prediction
batchprediction
evaluation
…
andromeda
dev
dev/andromeda
• Path elements:
• /andromeda specifies the API version (optional)
• /dev specifies development mode
• if not specified, then latest API in production mode
• {id} is required for PUT and DELETE
• {auth} contains url parameters username and api_key
• api_key can be an alternative key
7. BigML, Inc 7API / WhizzML
BigML API Endpoint
https://bigml.io/...{JSON} {JSON}
Operation HTTP Method Semantics
CREATE POST
Creates a new resource. Returns a JSON document
including a unique identifier.
RETRIEVE GET
Retrieves either a specific resource or a list of
resources.
UPDATE PUT Updates a resource. Only certain fields are putable.
DELETE DELETE Deletes a resource
11. BigML, Inc 11API / WhizzML
Python Binding Overview
Operation HTTP Method Binding Method
CREATE POST api.create_<resource>(from, {opts})
RETRIEVE GET
api.get_<resource>(id, {opts})
api.list_<resource>({opts})
UPDATE PUT api.update_<resource>(id, {opts})
DELETE DELETE api.delete_<resource>(id)
• Where <resource> is one of: source, dataset, model, ensemble, evaluation, etc
• id is a resource identifier or resource dict
• from is a resource identifier, dict, or string depending on context
13. BigML, Inc 13API / WhizzML
Diabetes Anomalies
DIABETES
SOURCE
DIABETES
DATASET
TRAIN SET
TEST SET
ALL
MODEL
CLEAN
DATASET
FILTER
ALL
MODEL
ALL
EVALUATION
CLEAN
EVALUATION
COMPARE
EVALUATIONS
ANAOMALY
DETECTOR
17. BigML, Inc 17API / WhizzML
WhizzML
• Complete programming language
• Machine Learning operations are first-class citizens
• Server-side execution abstracts infrastructure
• API First! - Everything is composable
• Shareable
A Domain-Specific Language (DSL) for
automating Machine Learning workflows.
18. BigML, Inc 18API / WhizzML
WhizzML vs API
WhizzML API / Bindings
Executes server-side
Zero latency
Parallelization built-in
Sharing built-in
Code agnostic workflows
Workflows can be UI
integrated
Requires local execution
Every API call has latency
Manual parallelization
Manual sharing
Code specific workflows
Workflows external to UI
27. BigML, Inc 27API / WhizzML
WhizzML vs Flatline
WhizzML Flatline
Concerned with resources
Turing complete
Optimized for parallelization
Concerned with datasets
More specific to features
Optimized for speed
29. BigML, Inc 29API / WhizzML
Redfin Workflow
Model
Predicts
Sale Price
Sold
Homes
Compare
List to
Prediction
30. BigML, Inc 30API / WhizzML
Redfin Workflow
MODEL
FILTERSOLD HOMES
BATCH
PREDICTION
NEW FEATURES
DATASET DEALS
DATASET
FILTERFORSALE HOMES NEW FEATURES
42. BigML, Inc 42API / WhizzML
Best-First Features
{F1}
CHOOSE BEST
S = {Fa}
{F2} {F3} {F4} Fn
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST
S = {Fa, Fb}
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST
S = {Fa, Fb, Fc}
43. BigML, Inc 43API / WhizzML
Model Selection
ENSEMBLE LOGISTIC
REGRESSION
EVALUATION
SOURCE DATASET
TRAINING
TEST
MODEL
EVALUATIONEVALUATION
CHOOSE
44. BigML, Inc 44API / WhizzML
Model Tuning
ENSEMBLE
N=20
EVALUATION
SOURCE DATASET
TRAINING
TEST
EVALUATIONEVALUATION
ENSEMBLE
N=10
ENSEMBLE
N=1000
CHOOSE
45. BigML, Inc 45API / WhizzML
SMACdown
• How many models?
• How many nodes?
• Missing splits or not?
• Number of random candidates?
• Balance the objective?
SMACdown can tell you!
48. BigML, Inc 48API / WhizzML
Why WhizzML
• Automation is critical to fulfilling the promise of ML
• WhizzML can create workflows that:
• Automate repetitive tasks.
• Automate model tuning and feature
selection.
• Combine ML models into more powerful
algorithms.
• Create shareable and re-usable executions.