Keras and Deep Learning
Agenda :
6:30 - 6:45 PM: News and Introduction
6:45 – 7:30 PM: Introduction to Keras
7:30 - 8:15 PM: A demo of Google Cloud ML Engine for Deep Learning
8:15 - 8:30 PM: Wrap-up
This document discusses using Drupal and OpenLayers together for mapping applications. It provides an overview of why to use Drupal for mapping, how to set up mapping with Drupal and OpenLayers, including installing required and useful modules, configuring layers, styles, behaviors and other settings, and setting up content types and views to display map data.
Community-Driven Graphs with JanusGraphJason Plurad
Presented at Open Camps (Database Camp, Search Camp) in New York City on November 19, 2017. http://www.searchcamp.io/2017/presentations/community-driven-graphs-with-janusgraph
Presented at the Linked Data Benchmark Council (LDBC) Technical User Group (TUG) Meeting on June 8, 2018. http://www.ldbcouncil.org/blog/11th-tuc-meeting-university-texas-austin
Presented at Open Camps (Database Camp) in New York City on November 19, 2017. http://www.db.camp/2017/presentations/graph-computing-with-apache-tinkerpop
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community. Presented on October 18, 2017 at the Graph Technologies Meetup in Santa Clara, CA. https://www.meetup.com/_CAIDI/events/243122187/
Airline Reservations and Routing: A Graph Use CaseJason Plurad
We've all been there before... you hear the announcement that your flight is canceled. Fellow passengers race to the gate agent to rebook on the next available flight. How do they quickly determine the best route from Berlin to San Francisco? Ultimately the flight route network is best solved as a graph problem. We will discuss our lessons learned from working with a major airline to solve this problem using JanusGraph database. JanusGraph is an open source graph database designed for massive scale. It is compatible with several pieces of the open source big data stack: Apache TinkerPop (graph computing framework), HBase, Cassandra, and Solr. We will go into depth about our approach to benchmarking graph performance and discuss the utilities we developed. We will share our comparison results for evaluating which storage backend use with JanusGraph. Whether you are productizing a new database or you are a frustrated traveler, a fast resolution is needed to satisfy everybody involved. Presented at DataWorks Summit Berlin on April 18, 2018
The document summarizes announcements from NEXT'17 including the expansion of Google Cloud services and regions. Key points include:
- New regions planned in California, Montreal, and Netherlands, with 17 total regions planned for upcoming years.
- Cloud SQL for Postgres and Microsoft SQL Server Enterprise are now available in beta and generally available, respectively.
- Cloud Machine Learning Engine is now generally available for training and deploying custom models.
- The Cloud Video Intelligence API provides serverless video analysis.
- Several G Suite features were expanded like Team Drives, file streaming, and Hangouts Meet conferencing capabilities.
- The free tier was extended to $300 of credits for 12 months and certain
Ktunaxa RMS, open source GIS for a first nation by Joachim Van der AuweraMapWindow GIS
The Ktunaxa RMS is an open source web GIS application which does referral management for the Ktunaxa Nation Council. When someone want to do a project on the territory of the Ktunaxa people, a referral will need to be evaluated using this system. This analysis is guided using the application, evaluating the Ktunaxa values. This involves mostly spatial aspects.
The presentation demonstrates the application and how it was built. The application uses many open source projects, including Geomajas, Hibernate Spatial, Activiti, Alfresco. The presentation focusses on the spatial aspects and integration (but the other integrations will also be mentioned).
This document discusses using Drupal and OpenLayers together for mapping applications. It provides an overview of why to use Drupal for mapping, how to set up mapping with Drupal and OpenLayers, including installing required and useful modules, configuring layers, styles, behaviors and other settings, and setting up content types and views to display map data.
Community-Driven Graphs with JanusGraphJason Plurad
Presented at Open Camps (Database Camp, Search Camp) in New York City on November 19, 2017. http://www.searchcamp.io/2017/presentations/community-driven-graphs-with-janusgraph
Presented at the Linked Data Benchmark Council (LDBC) Technical User Group (TUG) Meeting on June 8, 2018. http://www.ldbcouncil.org/blog/11th-tuc-meeting-university-texas-austin
Presented at Open Camps (Database Camp) in New York City on November 19, 2017. http://www.db.camp/2017/presentations/graph-computing-with-apache-tinkerpop
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community. Presented on October 18, 2017 at the Graph Technologies Meetup in Santa Clara, CA. https://www.meetup.com/_CAIDI/events/243122187/
Airline Reservations and Routing: A Graph Use CaseJason Plurad
We've all been there before... you hear the announcement that your flight is canceled. Fellow passengers race to the gate agent to rebook on the next available flight. How do they quickly determine the best route from Berlin to San Francisco? Ultimately the flight route network is best solved as a graph problem. We will discuss our lessons learned from working with a major airline to solve this problem using JanusGraph database. JanusGraph is an open source graph database designed for massive scale. It is compatible with several pieces of the open source big data stack: Apache TinkerPop (graph computing framework), HBase, Cassandra, and Solr. We will go into depth about our approach to benchmarking graph performance and discuss the utilities we developed. We will share our comparison results for evaluating which storage backend use with JanusGraph. Whether you are productizing a new database or you are a frustrated traveler, a fast resolution is needed to satisfy everybody involved. Presented at DataWorks Summit Berlin on April 18, 2018
The document summarizes announcements from NEXT'17 including the expansion of Google Cloud services and regions. Key points include:
- New regions planned in California, Montreal, and Netherlands, with 17 total regions planned for upcoming years.
- Cloud SQL for Postgres and Microsoft SQL Server Enterprise are now available in beta and generally available, respectively.
- Cloud Machine Learning Engine is now generally available for training and deploying custom models.
- The Cloud Video Intelligence API provides serverless video analysis.
- Several G Suite features were expanded like Team Drives, file streaming, and Hangouts Meet conferencing capabilities.
- The free tier was extended to $300 of credits for 12 months and certain
Ktunaxa RMS, open source GIS for a first nation by Joachim Van der AuweraMapWindow GIS
The Ktunaxa RMS is an open source web GIS application which does referral management for the Ktunaxa Nation Council. When someone want to do a project on the territory of the Ktunaxa people, a referral will need to be evaluated using this system. This analysis is guided using the application, evaluating the Ktunaxa values. This involves mostly spatial aspects.
The presentation demonstrates the application and how it was built. The application uses many open source projects, including Geomajas, Hibernate Spatial, Activiti, Alfresco. The presentation focusses on the spatial aspects and integration (but the other integrations will also be mentioned).
One of the first problems a developer encounters when evaluating a graph database is how to construct a graph efficiently. Recognizing this need in 2014, TinkerPop's Stephen Mallette penned a series of blog posts titled "Powers of Ten" which addressed several bulkload techniques for Titan. Since then Titan has gone away, and the open source graph database landscape has evolved significantly. Do the same approaches stand the test of time? In this session, we will take a deep dive into strategies for loading data of various sizes into modern Apache TinkerPop graph systems. We will discuss bulkloading with JanusGraph, the scalable graph database forked from Titan, to better understand how its architecture can be optimized for ingestion. Presented at Data Day Texas on January 27, 2018.
Exploring Graph Use Cases with JanusGraphJason Plurad
Graph databases are relative newcomers in the NoSQL database landscape. What are some graph model and design considerations when choosing a graph database in your architecture? Let's take a tour of a couple graph use cases that we've collaborated on recently with our clients to help you better understand how and why a graph database can be integrated to help solve problems found with connected data. Presented at DataWorks Summit San Jose - IBM Meetup on June 18, 2018.
https://www.meetup.com/BigDataDevelopers/events/251307524/
The document provides information about Simon Su and his expertise in Google Dataflow. It includes Simon's contact information and links to his online profiles. It then discusses Simon's areas of specialization including data scientist, data engineer, and frontend engineer. The document proceeds to provide information about preparing for a Google Dataflow workshop, including documents and labs to review. It also discusses Google Cloud services for data processing and analysis like Dataflow, BigQuery, Pub/Sub, and Dataproc. Finally, it outlines the agenda for the workshop, which will include hands-on labs to deploy users' first Dataflow project and create a streaming Dataflow model.
This document provides an introduction to Google Cloud Platform (GCP). It discusses cloud computing and service models like IaaS, PaaS, SaaS. It describes GCP and its global network and regions. It outlines key GCP services like Compute, Storage, Databases, Data Analytics, AI/ML. It explains GCP projects and resource hierarchy. It also discusses different ways to interact with GCP like the Cloud Console, SDKs, APIs and provides a demo.
Google Cloud Platform Introduction - 2016Q3Simon Su
The document summarizes news and services from Google Cloud Platform, including free GCE machine types, preemptible VMs, IAM project management, and new APIs for Machine Learning, Vision, and Speech. It also provides an overview of various GCP computing, storage, database and analytics services like Compute Engine, App Engine, Cloud SQL, Cloud Storage, BigQuery, and Dataflow. Join the Google Cloud Platform User Group Taiwan Facebook group for more information on GCP services and events.
Presentation giving as part of the Global Azure Bootcamp 2017, April 22, 2017. Subject: one-day hands-on workshop about the Cortana Intelligence Suite.
This document summarizes several projects that use open data and CartoDB. It describes a climate data visualization project that extracts and visualizes data from Amazon EBS. It also summarizes a project called "De casa al cole" that calculates school routes in Spain using OSRM. Additionally, it outlines an "Antipodes map" project that maps locations to their antipodes using data from Spain and New Zealand. It encourages reusing and building on top of open data and examples on CartoDB.
Presto talk @ Global AI conference 2018 Bostonkbajda
Presented at Global AI Conference in Boston 2018:
http://www.globalbigdataconference.com/boston/global-artificial-intelligence-conference-106/speaker-details/kamil-bajda-pawlikowski-62952.html
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, LinkedIn, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years. Presto is really a SQL-on-Anything engine in a single query can access data from Hadoop, S3-compatible object stores, RDBMS, NoSQL and custom data stores. This talk will cover some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as discuss the roadmap going forward.
App Engine is Google's fully managed platform as a service that allows developers to build and run applications on Google's infrastructure. It provides several services including Cloud Datastore for scalable storage, Cloud SQL for relational databases, Cloud Storage for file storage, and Task Queues for background processing. Developers can build and deploy applications using App Engine's SDKs and APIs, and App Engine automatically scales applications up and down as traffic levels change.
Trying out the Go language with Google App EngineLynn Langit
This document provides an overview of using the Go programming language with Google App Engine. It introduces Google App Engine and its competitors as platforms for hosting web applications in the cloud. It then discusses why the Go language is well-suited for App Engine, how to set up a development environment for Go and App Engine, and demonstrates a "Hello World" application deployed to App Engine. Additional resources for learning more about the Go language and developing with Go on App Engine are also provided.
Intro to the Google Cloud for DevelopersLynn Langit
This document provides an introduction to developing applications on Google Cloud. It discusses Google's cloud infrastructure and services like Compute Engine and App Engine. It demonstrates how to use the Google Cloud SDK and APIs to manage resources and build applications using various languages and tools. Specifically, it shows how to create instances in Compute Engine and Big Query, deploy applications to App Engine from Eclipse, and use command line tools to manage storage, databases and other services.
CloudRun is a serverless compute platform that allows running stateless containers without managing infrastructure or clusters. It supports many languages and automatically scales applications. CloudRun has several advantages over Google Kubernetes Engine (GKE) like automatic scaling, pay per use, and a fully managed platform. However, GKE allows more control and supports additional GCP products and deployment strategies at the cost of managing infrastructure.
Graph Computing with JanusGraph. Presented at Cleveland Big Data Mega Meetup on September 11, 2017. https://www.meetup.com/Cleveland-Hadoop/events/241553826/
Day 13 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
This technical session for Local Experts in Data Sharing (LEBDs), this session will explain how to create data processing services that are key to i4Trust.
Session 8 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
This technical session for Local Experts in Data Sharing (LEBDs), this session will explain how to create data processing services that are key to i4Trust.
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Ido Green
What is new and hot on Google Cloud?
How can you work like a pro with some (or all) the new APIs and services... Here are some good starting points to follow.
At Opendoor, we do a lot of big data processing, and use Spark and Dask clusters for the computations. Our machine learning platform is written in Dask and we are actively moving data ingestion pipelines and geo computations to PySpark. The biggest challenge is that jobs vary in memory, cpu needs, and the load in not evenly distributed over time, which causes our workers and clusters to be over-provisioned. In addition to this, we need to enable data scientists and engineers run their code without having to upgrade the cluster for every request and deal with the dependency hell.
To solve all of these problems, we introduce a lightweight integration across some popular tools like Kubernetes, Docker, Airflow and Spark. Using a combination of these tools, we are able to spin up on-demand Spark and Dask clusters for our computing jobs, bring down the cost using autoscaling and spot pricing, unify DAGs across many teams with different stacks on the single Airflow instance, and all of it at minimal cost.
Building Modern Data Pipelines on GCP via a FREE online BootcampData Con LA
Data Con LA 2020
Description
You just got hired by a large "tech startup". They're a hip travel agency like Kayak, "revolutionizing the airline industry" by developing an A/I that negotiates best airline deals on behalf of passengers. But in reality they are developing the AI to jack up ticket prices as it finds the passengers' preferences. They run their tech on the latest Google Cloud technologies, so you figured it's a great place to sharpen your skills as a Data Engineer despite the company's broken ethical compass. We teach Cloud Data Engineering to beginner/intermediate developers via a fun and engaging story. You will build a complete data-driven A/I pipeline. Ingest 6 years worth of real flight records, profile 30M+ user profiles and process 100M+ live streaming events while learning tools such as BigQuery, Dataflow (Apache Beam), DataProc (Apache Spark), Pub/Sub (Kafka), BigTable, and Airflow (Cloud Composer). During our talk, we will:
*Discuss the latest Serverless Data Architecture on GCP
*Explore the architectural decisions behind our Data Pipeline
*Run a live demo from our course
Speaker
Parham Parvizi, Tura Labs, Founder / Data Engineer
Machine learning at scale with Google Cloud PlatformMatthias Feys
Machine Learning typically involves big datasets and lots of model iterations. This presentation shows how to use GCP to speed up that process with ML Engine and Dataflow. The focus of the presentation is on tooling not on models or business cases.
One of the first problems a developer encounters when evaluating a graph database is how to construct a graph efficiently. Recognizing this need in 2014, TinkerPop's Stephen Mallette penned a series of blog posts titled "Powers of Ten" which addressed several bulkload techniques for Titan. Since then Titan has gone away, and the open source graph database landscape has evolved significantly. Do the same approaches stand the test of time? In this session, we will take a deep dive into strategies for loading data of various sizes into modern Apache TinkerPop graph systems. We will discuss bulkloading with JanusGraph, the scalable graph database forked from Titan, to better understand how its architecture can be optimized for ingestion. Presented at Data Day Texas on January 27, 2018.
Exploring Graph Use Cases with JanusGraphJason Plurad
Graph databases are relative newcomers in the NoSQL database landscape. What are some graph model and design considerations when choosing a graph database in your architecture? Let's take a tour of a couple graph use cases that we've collaborated on recently with our clients to help you better understand how and why a graph database can be integrated to help solve problems found with connected data. Presented at DataWorks Summit San Jose - IBM Meetup on June 18, 2018.
https://www.meetup.com/BigDataDevelopers/events/251307524/
The document provides information about Simon Su and his expertise in Google Dataflow. It includes Simon's contact information and links to his online profiles. It then discusses Simon's areas of specialization including data scientist, data engineer, and frontend engineer. The document proceeds to provide information about preparing for a Google Dataflow workshop, including documents and labs to review. It also discusses Google Cloud services for data processing and analysis like Dataflow, BigQuery, Pub/Sub, and Dataproc. Finally, it outlines the agenda for the workshop, which will include hands-on labs to deploy users' first Dataflow project and create a streaming Dataflow model.
This document provides an introduction to Google Cloud Platform (GCP). It discusses cloud computing and service models like IaaS, PaaS, SaaS. It describes GCP and its global network and regions. It outlines key GCP services like Compute, Storage, Databases, Data Analytics, AI/ML. It explains GCP projects and resource hierarchy. It also discusses different ways to interact with GCP like the Cloud Console, SDKs, APIs and provides a demo.
Google Cloud Platform Introduction - 2016Q3Simon Su
The document summarizes news and services from Google Cloud Platform, including free GCE machine types, preemptible VMs, IAM project management, and new APIs for Machine Learning, Vision, and Speech. It also provides an overview of various GCP computing, storage, database and analytics services like Compute Engine, App Engine, Cloud SQL, Cloud Storage, BigQuery, and Dataflow. Join the Google Cloud Platform User Group Taiwan Facebook group for more information on GCP services and events.
Presentation giving as part of the Global Azure Bootcamp 2017, April 22, 2017. Subject: one-day hands-on workshop about the Cortana Intelligence Suite.
This document summarizes several projects that use open data and CartoDB. It describes a climate data visualization project that extracts and visualizes data from Amazon EBS. It also summarizes a project called "De casa al cole" that calculates school routes in Spain using OSRM. Additionally, it outlines an "Antipodes map" project that maps locations to their antipodes using data from Spain and New Zealand. It encourages reusing and building on top of open data and examples on CartoDB.
Presto talk @ Global AI conference 2018 Bostonkbajda
Presented at Global AI Conference in Boston 2018:
http://www.globalbigdataconference.com/boston/global-artificial-intelligence-conference-106/speaker-details/kamil-bajda-pawlikowski-62952.html
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, LinkedIn, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years. Presto is really a SQL-on-Anything engine in a single query can access data from Hadoop, S3-compatible object stores, RDBMS, NoSQL and custom data stores. This talk will cover some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as discuss the roadmap going forward.
App Engine is Google's fully managed platform as a service that allows developers to build and run applications on Google's infrastructure. It provides several services including Cloud Datastore for scalable storage, Cloud SQL for relational databases, Cloud Storage for file storage, and Task Queues for background processing. Developers can build and deploy applications using App Engine's SDKs and APIs, and App Engine automatically scales applications up and down as traffic levels change.
Trying out the Go language with Google App EngineLynn Langit
This document provides an overview of using the Go programming language with Google App Engine. It introduces Google App Engine and its competitors as platforms for hosting web applications in the cloud. It then discusses why the Go language is well-suited for App Engine, how to set up a development environment for Go and App Engine, and demonstrates a "Hello World" application deployed to App Engine. Additional resources for learning more about the Go language and developing with Go on App Engine are also provided.
Intro to the Google Cloud for DevelopersLynn Langit
This document provides an introduction to developing applications on Google Cloud. It discusses Google's cloud infrastructure and services like Compute Engine and App Engine. It demonstrates how to use the Google Cloud SDK and APIs to manage resources and build applications using various languages and tools. Specifically, it shows how to create instances in Compute Engine and Big Query, deploy applications to App Engine from Eclipse, and use command line tools to manage storage, databases and other services.
CloudRun is a serverless compute platform that allows running stateless containers without managing infrastructure or clusters. It supports many languages and automatically scales applications. CloudRun has several advantages over Google Kubernetes Engine (GKE) like automatic scaling, pay per use, and a fully managed platform. However, GKE allows more control and supports additional GCP products and deployment strategies at the cost of managing infrastructure.
Graph Computing with JanusGraph. Presented at Cleveland Big Data Mega Meetup on September 11, 2017. https://www.meetup.com/Cleveland-Hadoop/events/241553826/
Day 13 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
This technical session for Local Experts in Data Sharing (LEBDs), this session will explain how to create data processing services that are key to i4Trust.
Session 8 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
This technical session for Local Experts in Data Sharing (LEBDs), this session will explain how to create data processing services that are key to i4Trust.
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Ido Green
What is new and hot on Google Cloud?
How can you work like a pro with some (or all) the new APIs and services... Here are some good starting points to follow.
At Opendoor, we do a lot of big data processing, and use Spark and Dask clusters for the computations. Our machine learning platform is written in Dask and we are actively moving data ingestion pipelines and geo computations to PySpark. The biggest challenge is that jobs vary in memory, cpu needs, and the load in not evenly distributed over time, which causes our workers and clusters to be over-provisioned. In addition to this, we need to enable data scientists and engineers run their code without having to upgrade the cluster for every request and deal with the dependency hell.
To solve all of these problems, we introduce a lightweight integration across some popular tools like Kubernetes, Docker, Airflow and Spark. Using a combination of these tools, we are able to spin up on-demand Spark and Dask clusters for our computing jobs, bring down the cost using autoscaling and spot pricing, unify DAGs across many teams with different stacks on the single Airflow instance, and all of it at minimal cost.
Building Modern Data Pipelines on GCP via a FREE online BootcampData Con LA
Data Con LA 2020
Description
You just got hired by a large "tech startup". They're a hip travel agency like Kayak, "revolutionizing the airline industry" by developing an A/I that negotiates best airline deals on behalf of passengers. But in reality they are developing the AI to jack up ticket prices as it finds the passengers' preferences. They run their tech on the latest Google Cloud technologies, so you figured it's a great place to sharpen your skills as a Data Engineer despite the company's broken ethical compass. We teach Cloud Data Engineering to beginner/intermediate developers via a fun and engaging story. You will build a complete data-driven A/I pipeline. Ingest 6 years worth of real flight records, profile 30M+ user profiles and process 100M+ live streaming events while learning tools such as BigQuery, Dataflow (Apache Beam), DataProc (Apache Spark), Pub/Sub (Kafka), BigTable, and Airflow (Cloud Composer). During our talk, we will:
*Discuss the latest Serverless Data Architecture on GCP
*Explore the architectural decisions behind our Data Pipeline
*Run a live demo from our course
Speaker
Parham Parvizi, Tura Labs, Founder / Data Engineer
Machine learning at scale with Google Cloud PlatformMatthias Feys
Machine Learning typically involves big datasets and lots of model iterations. This presentation shows how to use GCP to speed up that process with ML Engine and Dataflow. The focus of the presentation is on tooling not on models or business cases.
End To End Machine Learning With Google Cloud Tu Pham
This document discusses end-to-end machine learning with Google Cloud. It outlines an 8-step process for collecting raw data, converting it to Apache Parquet files, uploading it to Cloud Storage, exploring it in Datalab, developing models in TensorFlow/Scikit-learn, training models at scale on Cloud ML Engine, deploying models via APIs on Compute Engine, and exposing APIs with Load Balancing. Key principles discussed are keeping it simple, avoiding repetition, and focusing on scalability, performance, and cost optimization. The presenter encourages planning systems with single responsibilities, separating real-time and batch flows, and saving on networking, instance, and storage costs through monitoring.
Getting started with GCP ( Google Cloud Platform)bigdata trunk
This document provides an overview and introduction to Google Cloud Platform (GCP). It begins with introductions and an agenda. It then discusses cloud computing concepts like deployment models and service models. It provides details on specific GCP computing, storage, machine learning, and other services. It describes how to set up Qwiklabs to do hands-on labs with GCP. Finally, it discusses next steps like training and certification for expanding GCP knowledge.
Workshop on Google Cloud Data PlatformGoDataDriven
The document provides an agenda and information about a GoDataFest workshop on Google Cloud Platform for data. The agenda includes an introduction to GCP for data, a session on roles and tools on GCP for different data roles, and a session where participants will build projects on GCP in mixed workgroups. It outlines the goals and tools used by different roles like data engineer, analytics engineer, and Looker user. It also provides information on Google Cloud technologies like BigQuery, Dataform, Looker, and how they fit into the modern data lifecycle and platform. Participants are then divided into mixed workgroups based on their preferred role and given insights to explore in their projects.
Presentation given on the 15th July 2021 at the Airflow Summit 2021
Conference website: https://airflowsummit.org/sessions/2021/clearing-airflow-obstructions/
Recording: https://www.crowdcast.io/e/airflowsummit2021/40
This document provides an overview of using TensorFlow and Quarkus to build intelligent applications that serve machine learning models. It begins with an introduction and agenda. It then discusses TensorFlow and how it can be used to build and train machine learning models. It demonstrates how a TensorFlow model can be served using Quarkus and consumed via HTTP requests. The technical benefits of serving models with Quarkus are described. Finally, use cases, additional resources, and a Q&A section are outlined.
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
ML solutions in production start from data ingestion and extend upto the actual deployment step. We want this workflow to be scalable, portable and simple. Containers and kubernetes are great at the former two but not the latter if you aren't a devops practitioner. We'll explore how you can leverage the Kubeflow project to deploy best-of-breed open-source systems for ML to diverse infrastructures.
This presentation is about tools and techniques used in the field of data sciences, data analytics and data engineering. it is a collection of graphics and tabular data for quick learning.
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
ML platform meetups are quarterly meetups, where we discuss and share advanced technology on machine learning infrastructure. Companies involved include Airbnb, Databricks, Facebook, Google, LinkedIn, Netflix, Pinterest, Twitter, and Uber.
These slides are made for the 2013 DevFest talks. It covers the main blocks of Google cloud platform: App engine, Compute Engine, storage options and more.
MLOps aims to increase the velocity of machine learning model development through an organizational and cultural movement that breaks down barriers between development and operations teams. It involves treating machine learning models and data as first-class citizens in a DevOps workflow. This allows for continuous integration, delivery, and monitoring of models through practices like code, model, and data versioning. Tools that support MLOps include platforms for data and model versioning like DVC and frameworks for workflows and experiment tracking like TensorFlow Extended. MLOps principles can improve the speed, reliability, scaling, and collaboration of machine learning systems.
The document provides an overview of tasks and skills to learn for a career in data science and analytics. It lists technologies like SQL Server, Linux, networking protocols, Python, TensorFlow, Kafka, Terraform, and tools like Tableau. It also mentions companies in Pakistan and Dubai to explore for work opportunities and lists top companies employing data scientists in Dubai. Finally, it provides some YouTube video links on related topics like Spark vs Hadoop, data center standards, and networking fundamentals.
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDatabricks
We started out processing big data using AWS S3, EMR clusters, and Athena to serve Analytics data extracts to Tableau BI.
However as our data and teams sizes increased, Avro schemas from source data evolved, and we attempted to serve analytics data through Web apps, we hit a number of limitations in the AWS EMR, Glue/Athena approach.
This is a story of how we scaled out our data processing and boosted team productivity to meet our current demand for insights from 20M+ Smart Homes and 500M+ devices across the globe, from numerous internal business teams and our 150+ CSP partners.
We will describe lessons learnt and best practices established as we enabled our teams with DataBricks autoscaling Job clusters and Notebooks and migrated our Avro/Parquet data to use MetaStore, SQL Endpoints and SQLA Console, while charting the path to the Delta lake…
Google Cloud Professional Data Engineer certification prepares machine learning engineers for running ML models in production. This includes DevOps tasks, such as monitoring and scaling.
Time series in financial domain: A deep learning approach
Agenda:
-----------
3:45pm - 4:00pm: Arrival & Networking
4:00pm - 4:15pm: News & Intro
4:15pm - 5:15pm: Time series in financial domain:
A deep learning approach
5:15pm - 5:30pm: Virtual Snack & Networking
About the main speaker:
---------------------------------
Christophe Pere, Chief Data Scientist (La Capitale Insurance)
"I have a Ph.D. in Astrophysics specializing in atmospheric characterization during which I was able to learn the basics of data mining. I then continued my journey through data science and AI in order to understand the principles of learning. I worked on the evaluation of automotive markets and the prediction of vehicle prices with time series and machine learning models. I then joined an institute specialized in autonomous vehicles, the objective was to be able to recreate the real world from sensor data on vehicles traveling in Europe. I finally moved to Canada in the insurance industry to develop an R&D team and work on research projects focused on NLP, time series, and graphs."
We see twice - or why the fly’s brain is better than our big brains when swapping is involved.
Vision came early in evolution; trilobites could see more than 500 million years ago. Yet, our understanding of visual processes is cortex-centric, a recent structure. Similarly, deep-neural network architectures are often loosely based on the visual cortex. One might ask: does a computerk really need to “consciously” see? In this presentation, I will share completely novel findings from brain implant data in rodents showing that there exists an evolutionary ancient brain region, the superior colliculus, that we share with mosquitos, fishes and birds, which can compute complex scenes and can even process categorical information. We will then discuss how we could use this fast acting, wide-reaching circuitry to come up with novel neural network architectures. These new networks could be useful for fast-acting machine-vision models with far reaching potential, such as for scene navigation.
Tools using AI will affect and, in many cases, redefine most areas of societal impact such as medical practice and intervention, autonomous transportation and law enforcement. While so far, most of the focus and time is invested into optimizing models’ performance, whenever a single wrong prediction has big implications in terms of value or life, accuracy becomes less important than explainability.
In this talk, we will learn about explainable AI and we will see how to apply some of the available tools to answer the question ‘’what did my system consider in order to output a specific prediction’.
How can reinforcement learning help us fly balloons in the stratosphere?
This talk describes the use of reinforcement learning to create a high-performing flight controller for Loon superpressure balloons. The Google algorithm uses data augmentation and a self-correcting design to overcome the key technical challenge of reinforcement learning from imperfect data, which has proved to be a major obstacle to its application to physical systems.
Marc G. Bellemare, from the Google Brain team in Montreal, will present recent work, published in Nature, on using reinforcement learning to fly tennis-court-sized balloons in the stratosphere.
Agenda:
-----------
3:45pm - 4:00pm: Arrival & Networking
4:00pm - 4:15pm: News & Intro
4:15pm - 5:15pm: RL to fly balloons in the stratosphere
5:15pm - 5:30pm: Virtual Snack & Networking
Building a robust machine learning model is not an easy task. After all, most POCs don't make it into production. And even if they make it into production, you still need to monitor its performance.
How can you build performant, tolerant, stable, predictive models that have known and fair biases? How can you make sure your models yield their value over time and stay performant after your team has deployed them? What are the current practices of model validation (or lack of), how are they flawed, and how could we improve them?
Simon Dagenais from Snitch AI will go through the reasons behind using an efficient validation framework that goes beyond the common metrics used by ML practitioners and why these tests matter when building high-quality models.
Agenda:
-----------
3:45pm - 4:00pm: Arrival & Networking
4:00pm - 4:15pm: News & Intro
4:15pm - 5:15pm: How to QA your ML models
5:15pm - 5:30pm: Virtual Snack & Networking
About the main speaker:
---------------------------------
Simon Dagenais is the Lead Data Scientist at Snitch AI, a machine learning validation tool. Before working on Snitch AI, Simon was a data scientist consultant at Moov AI, the parent company of Snitch AI. During his time as a consultant, he built and deployed custom ML solutions to solve business needs at companies like DRW, Société de Transport de Montréal and Cogeco. He now aspires to solve problems that data science teams will encounter during the course of a ML project cycle. Simon obtained an M.Sc. in economics from HEC Montreal. He frequently speaks in conferences, panels and meetups.
Artificial intelligence (AI) is dramatically transforming the world of finance
Agenda:
6:15 - 6:45: Arrival, Snack & Networking
6:45 - 7:15 PM: News and DSDT Survey Results
7:15 – 8:00 PM: How AI is transforming the world of finance (Nick Vandewiele)
8:00 - 8:30 PM: Wrap-up, Snack & Networking
About the main speaker:
Nick Vandewiele, PhD is a data scientist in one of the big financial institutions in Montreal. His experience in AI in finance includes financial markets, investments, and credit risk modeling
The document summarizes a meetup on data streaming and machine learning with Google Cloud Platform. The meetup consisted of two presentations:
1. The first presentation discussed using Apache Beam (Dataflow) on Google Cloud Platform to parallelize machine learning training for improved performance. It showed how Dataflow was used to reduce training time from 12 hours to under 30 minutes.
2. The second presentation demonstrated building a streaming pipeline for sentiment analysis on Twitter data using Dataflow. It covered streaming patterns, batch vs streaming processing, and a demo that ingested tweets from PubSub and analyzed them using Cloud NLP API and BigQuery.
The Art of Data Visualization
Agenda:
6:00 - 6:15: Welcome
6:15 – 6:45: Guidelines for Data Visualization
6:45- 7:30 : Large-scale GPU-Accelerated Data Visualization with MapD
7:30 - 8:00: 1000+ Members Giveaway / Networking + Q&A
Special Event Meetup on Gamification
Agenda:
5:45 - 6:00: Welcome & Networking
6:00 - 6:15: News and Introduction
6:15 – 7:15: Studies in Gameful Interaction Design and Games User Research + Q&A
7:15 - 7:30: Networking
Innovation in Human-Machine Interaction
Agenda:
5:45 - 6:00: Welcome & Networking
6:00 - 6:15: News and Introduction
6:15 – 7:00: The value of applied research in human-system interaction in B2B environment
7:00 - 7:30: Panel: Review of new human-system interaction seen at CHI 2018
7:30 - 8:00: Networking
AI, Machine Learning, Deep Learning: Stop talking and start acting!
Agenda :
6:00 - 6:15 PM: Welcome & Networking
6:15 - 6:30 PM: News and Introduction
6:30 – 7:15 PM: How to Get Started With AI
7:15 - 8:00 PM: Commit Assistant – AI to predict and prevent bugs in Ubisoft’s code
8:00 - 8:30 PM: Networking
The document summarizes the agenda and announcements for a meetup of the Data Science, Design, Technology Montreal (DSDT) group. The meetup included an introduction to Dash, an open source Python framework for building analytical web applications, and a practical example of using Plotly and Dash. Additional announcements provided information on sponsorships and venues for future meetups, as well as job postings and upcoming events in the local data science and technology community.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
4. 4
Movement to promote Québec’s technology industry:
• Increase knowledge on the innovation ecosystem
• Promote entrepreneurship, careers and education
• Increase international recognition
Technopolys
More than 550 contributing companies
@technopolys_qc
www.linkedin.com/showcase/technopolys
6. 6
Ecosystem News
● Feb 27: Kubernetes Q1 Meetup 2019 – Cloud Native Computing Foundation update
https://www.meetup.com/Kubernetes-Montreal/events/258883214/
● Feb/March: Desjardins Labs:
https://www.meetup.com/DesjardinsLab/events/
● March 12: Data Driven Montreal - HEC Forecast
https://www.meetup.com/DataDrivenMTL/events/259082491/
● April 10-11: World Summit AI - Americas (Montreal):
https://americas.worldsummit.ai/
7. 7
DSDT on Slideshare
● www.slideshare.net/DSDT_MTL
● All presentations since DSDT creation
8. 8
Keras and Deep Learning
Nicolas Feller
“Keras: From Core concepts
to Advanced
Experimentation”
Florian Soudan
“Demo of Google Cloud ML Engine
for Deep Learning”
10. Outline
- What is Keras
- How to use Keras
- Examples and Tutorials
- Advanced(ish) Example
- Upcoming Roadmap
11. Keras: API for specifying & training differentiable
programs (deep learning for humans)
Keras API
Tensorflow or Theano,
MXnet, CNTK
Hardware: CPU, GPU, TPU
12. Official high-level API of Tensorflow
● Tensorflow specific functionality
○ tf.data pipelines
○ Estimators, conceptual abstractions that isolates training, evaluating and deploy as tensorflow
○ Multiple GPUs
■ data parallelism - same model on each device
■ device parallelism - part of the model on each device
○ TPUs
○ Tensorboard - visualize learning
○ Data Augmentation
○ Eager execution (currently limited, improvements on the way)
14. Jeff Hale Power Score Criteria
● Online job listings
● KDnuggets usage survey
● Google search volume
● Medium articles
● Amazon books
● arXiv articles
● Github activity
15. Deep learning for real life
● Android tensorflow runtime
● iOS CoreML
● Keras.js and WebDNN GPU accelerated JS runtimes
● Google Cloud via tensorflow serving - ML engine
● Web backend in Flask
● JVM in DL4J
● Raspberry Pi
16. Start Using Keras in seconds
● Start a Jupyter Notebook from
Tensorflow docker
● Regular python download
● Access Google Colabs from
any gmail address
17. Demo: http://bit.ly/2tzaRWJ
● Model types
○ Sequential
○ Functional
○ Model Subclassing
● Visualize model
○ Summary
○ Plot_model
● Extra features
○ Use model
○ tf.data
○ Custom layers
○ Callbacks
○ Saving and restoring model
○ Pretrained Models
18. Upcoming Features
● Eager execution
● Distributed training - tensorflow like performance
○ Parameter strategies
● Tight integration to build and productionize
○ Export to tf life and tfx
● Better tensorboard integration (profiler, displaying graph correctly)
● Canned models
● Improved performance
20. A demo of Google
Cloud ML Engine for
Deep Learning
Data Science | Design | Technology 20
https://github.com/ivado-labs/meetup-googleml-keras
21. 21
Typical Deep Learning Workflow (Iterative)
1. Data pipeline preparation
2. Model development/optimization
3. Training monitoring
4. Result analysis Google Cloud
Buckets
ML Engine
Computer
Local development
Data
Images
Logs &
Weights
Read Logs
Launch
Tensorboard
Server
Launch
Notebook
Server
Submit Training Job
1 4 3
2
22. Merci / Thank You
@DsdtMtl
Data Science | Design | Technology
(Check for next DSDT meetup at https://www.meetup.com/DSDTMTL)
http://bit.ly/dsdtmtl-in