The system detects faults of a Smart Lathe machine from the data received from Industrial IoT devices to reduce decision and analysis latency. The model was saved using Joblib Python library for predicting the data given as input in the Frontend interface. Packaging was done and API endpoints were made using Flask library to trigger function calls. Streamlit library was used to design the frontend part of the application with which the user interacts to feed the data and get the required predictions.
Driverless AI - Intro + Interactive Hands-on LabSri Ambati
Enjoy the webinar recording here: https://youtu.be/Lll1qwQJKVw.
Driverless AI speeds up data science workflows by automating feature engineering, model tuning, ensembling, and model deployment.
In this presentation, Arno Candel (CTO, H2O.ai), gives a quick overview and guide attendees through an interactive hands-on lab using Qwiklabs.
Driverless AI turns Kaggle-winning recipes into production-ready code and is specifically designed to avoid common mistakes such as under or overfitting, data leakage or improper model validation. Avoiding these pitfalls alone can save weeks or more for each model, and is necessary to achieve high modeling accuracy.
With Driverless AI, everyone can now train and deploy modeling pipelines with just a few clicks from the GUI. Advanced users can use the client/server API through a variety of languages such as Python, Java, C++, go, C# and many more. To speed up training, Driverless AI uses highly optimized C++/CUDA algorithms to take full advantage of the latest compute hardware.
For example, Driverless AI runs orders of magnitudes faster on the latest Nvidia GPU supercomputers on Intel and IBM platforms, both in the cloud or on-premise. There are two more product innovations in Driverless AI: statistically rigorous automatic data visualization and interactive model interpretation with reason codes and explanations in plain English. Both help data scientists and analysts to quickly validate the data and models.
Evaluation of TPC-H on Spark and Spark SQL in ALOJADataWorks Summit
The Evaluation of TPC-H on Spark and Spark SQL in ALOJA was conducted at the Big Data Lab to obtain the master degree in Management Information Systems at the Johann-Wolfgang Goethe University in Frankfurt, Germany. Furthermore, the analysis was partially accomplished in collaboration and close coordination with the Barcelona Super Computer Center.
The intention of this research was the integration of a TPC-H on Spark Scala benchmark into ALOJA, an open-source and public platform for automated and cost-efficient benchmarks and to perform an evaluation on the runtime of Spark Scala with or without Hive Metastore compared to Spark SQL. Various alternate file formats with different applied compressions on underlying data and its impact are evaluated. The conducted performance evaluation exposed diverse and captivating outcomes for both benchmarks. Further investigations attempt to detect possible bottlenecks and other irregularities. The aim is to provide an explanation to enhance knowledge of Spark’s engine based on examining the physical plans. Our experiments show, inter alia, that: (1) Spark Scala performs better in case of heavy expression calculation, (2) Spark SQL is the better choice in case of strong data access locality in combination with heavyweight parallel execution. In conclusion, diverse results were observed with the consequence that each API has its advantages and disadvantages.
Surprisingly, our findings are well spread between Spark SQL and Spark Scala and contrary to our expectations Spark Scala did not outperform Spark SQL in all aspects but support the idea that applied optimizations appear to be implemented in a different way by Spark for its core and its extension Spark SQL. The API on top of Spark provides extra information about the underlying structured data, which is probably used to perform additional optimizations.
In conclusion, our research demonstrates that there are differences in the generation of query execution plans that goes hand-in-hand with similar discoveries leading to inefficient joins, and it underlines the importance of our benchmark to identify disparities and bottlenecks.
Speaker
Raphael Radowitz, Quality Specialist, SAP Labs Korea
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...Databricks
Intuit products increasingly rely on AI solutions to drive in-product experiences and customer outcomes (a realization of Intuit’s AI-driven expert platform strategy). In order to provide complete confidence to Intuit customers through reliable and predictable experiences, we need to ensure the health of all AI solutions by continuously monitoring, managing and understanding them within Intuit products.
At Intuit, we have deployed 100’s of Machine Learning models in production to solve various problems as below:
Cash Flow forecasting
Security, risk and fraud
Document understanding
Connect customers to right agents
With so many models in production, it becomes very important to monitor and manage these models in a centralized manner. With very few open source tools available to monitor and manage ML models, data scientists find it very difficult to properly track their models. Moreover, different personas in the organization are looking for different information from the models. For example, the DevOps team is interested in operational metrics. Financial analysts are interested in determining the operational cost of a model and the legal and compliance teams might want to find if the models are explainable and privacy compliant.
At Intuit, we have designed and developed a system that tracks and monitors ML Models across the various Model development lifecycle stages. In this Summit, we will be presenting the various challenges in building such a central system. We would also share the overall architecture and the internals of this system.
The system detects faults of a Smart Lathe machine from the data received from Industrial IoT devices to reduce decision and analysis latency. The model was saved using Joblib Python library for predicting the data given as input in the Frontend interface. Packaging was done and API endpoints were made using Flask library to trigger function calls. Streamlit library was used to design the frontend part of the application with which the user interacts to feed the data and get the required predictions.
Driverless AI - Intro + Interactive Hands-on LabSri Ambati
Enjoy the webinar recording here: https://youtu.be/Lll1qwQJKVw.
Driverless AI speeds up data science workflows by automating feature engineering, model tuning, ensembling, and model deployment.
In this presentation, Arno Candel (CTO, H2O.ai), gives a quick overview and guide attendees through an interactive hands-on lab using Qwiklabs.
Driverless AI turns Kaggle-winning recipes into production-ready code and is specifically designed to avoid common mistakes such as under or overfitting, data leakage or improper model validation. Avoiding these pitfalls alone can save weeks or more for each model, and is necessary to achieve high modeling accuracy.
With Driverless AI, everyone can now train and deploy modeling pipelines with just a few clicks from the GUI. Advanced users can use the client/server API through a variety of languages such as Python, Java, C++, go, C# and many more. To speed up training, Driverless AI uses highly optimized C++/CUDA algorithms to take full advantage of the latest compute hardware.
For example, Driverless AI runs orders of magnitudes faster on the latest Nvidia GPU supercomputers on Intel and IBM platforms, both in the cloud or on-premise. There are two more product innovations in Driverless AI: statistically rigorous automatic data visualization and interactive model interpretation with reason codes and explanations in plain English. Both help data scientists and analysts to quickly validate the data and models.
Evaluation of TPC-H on Spark and Spark SQL in ALOJADataWorks Summit
The Evaluation of TPC-H on Spark and Spark SQL in ALOJA was conducted at the Big Data Lab to obtain the master degree in Management Information Systems at the Johann-Wolfgang Goethe University in Frankfurt, Germany. Furthermore, the analysis was partially accomplished in collaboration and close coordination with the Barcelona Super Computer Center.
The intention of this research was the integration of a TPC-H on Spark Scala benchmark into ALOJA, an open-source and public platform for automated and cost-efficient benchmarks and to perform an evaluation on the runtime of Spark Scala with or without Hive Metastore compared to Spark SQL. Various alternate file formats with different applied compressions on underlying data and its impact are evaluated. The conducted performance evaluation exposed diverse and captivating outcomes for both benchmarks. Further investigations attempt to detect possible bottlenecks and other irregularities. The aim is to provide an explanation to enhance knowledge of Spark’s engine based on examining the physical plans. Our experiments show, inter alia, that: (1) Spark Scala performs better in case of heavy expression calculation, (2) Spark SQL is the better choice in case of strong data access locality in combination with heavyweight parallel execution. In conclusion, diverse results were observed with the consequence that each API has its advantages and disadvantages.
Surprisingly, our findings are well spread between Spark SQL and Spark Scala and contrary to our expectations Spark Scala did not outperform Spark SQL in all aspects but support the idea that applied optimizations appear to be implemented in a different way by Spark for its core and its extension Spark SQL. The API on top of Spark provides extra information about the underlying structured data, which is probably used to perform additional optimizations.
In conclusion, our research demonstrates that there are differences in the generation of query execution plans that goes hand-in-hand with similar discoveries leading to inefficient joins, and it underlines the importance of our benchmark to identify disparities and bottlenecks.
Speaker
Raphael Radowitz, Quality Specialist, SAP Labs Korea
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...Databricks
Intuit products increasingly rely on AI solutions to drive in-product experiences and customer outcomes (a realization of Intuit’s AI-driven expert platform strategy). In order to provide complete confidence to Intuit customers through reliable and predictable experiences, we need to ensure the health of all AI solutions by continuously monitoring, managing and understanding them within Intuit products.
At Intuit, we have deployed 100’s of Machine Learning models in production to solve various problems as below:
Cash Flow forecasting
Security, risk and fraud
Document understanding
Connect customers to right agents
With so many models in production, it becomes very important to monitor and manage these models in a centralized manner. With very few open source tools available to monitor and manage ML models, data scientists find it very difficult to properly track their models. Moreover, different personas in the organization are looking for different information from the models. For example, the DevOps team is interested in operational metrics. Financial analysts are interested in determining the operational cost of a model and the legal and compliance teams might want to find if the models are explainable and privacy compliant.
At Intuit, we have designed and developed a system that tracks and monitors ML Models across the various Model development lifecycle stages. In this Summit, we will be presenting the various challenges in building such a central system. We would also share the overall architecture and the internals of this system.
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
As a leading e-commerce company in fashion in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for the customers. Using Spark, the data science team is able to develop various machine-learning projects for this purpose based on the large scale data of products and customers. A major topic for the data science team is ranking products. If a visitor enters a search phrase, what are the best products that fit the search phrase and in what order should the products been shown? Ranking products is also important if a visitor enters a product overview page, where hundreds or even thousands of products of a certain article type are displayed.
In this project, Spark is used in the whole pipeline: retrieving and processing the search phrases and their results, making click models, creating feature sets, training and evaluating ranking models, pushing the models to production using ElasticSearch and creating Tableau dashboarding. In this talk, we are going to demonstrate how we use Spark to build up the whole pipeline of ranking products and the challenges we faced along the way.
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...Flink Forward
New use cases under the Industry 4.0 umbrella are playing a key role in improving factory operations, process optimization, cost reduction and quality improvement. We propose an event streaming architecture to streamline the information flow all the way from the factory to the main data center. Building such a streaming architecture enables a manufacturer to react faster to critical operational events. However, it presents two main challenges:
Data acquisition in real time: data should be collected regardless of its location or access challenges are. It is commonplace to ingest data from hundreds of heterogeneous data sources (ERP, MES, Sensors, maintenance systems, etc).
Event processing in real time: events collected from different parts of the organization should be combined into actionable insights in real time. This is extremely challenging in a context where events can be lost or delayed.
In this talk, we show how Apache NiFi and MiNiFi can be used to collect a wide range of datasources in real-time, connecting the industrial and information worlds. Then, we show how Apache Flink’s unique features enables us to make sense of this data. For instance, we will explain how Flink’s time management such Event Time mode, late arrival handling and watermark mechanism can be used to address the challenge of processing IoT data originating from geographically distributed plants. Finally, we demonstrate an end to end streaming architecture for Industry 4.0 based on the Cloudera DataFlow platform.
At the CodeTalks conference 2017 in Hamburg, LeanIX presented their lessons learned for GraphQL, a new alternative for building REST APIs which was introduced by Facebook.
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofGoDataDriven
During GoDataFest 2019, Rens Weijers, manager data & strategy and Peter van ' t Hof, data engineer, share the story of how Vattenfall develops smart applications on Azure. Vattenfall has the ambition to transition to fossil-free living within one generation. But what about decentral energy solutions in the Customers & Solutions business unit? Data is key to help customers to reduce their CO2 footprint. Azure enables Vattenfall to be personal and relevant towards customers.
Building, managing, and maintaining thousands of features across thousands of models. Building features can be repetitive, tedious and extremely challenging to scale. We will explore the ‘Feature Factory’ built at Databricks and implemented at several clients and the processes that are imperative for the democratization of feature development and deployment. The feature factory allows consumers to ensure repetitive feature creation, simplifies scoring and enables massive scalability through feature multiplication.
Massively Scalable Computational Finance with SciDBParadigm4Inc
Hedge funds, investment managers and prop shops need to keep pace with rapidly growing data volumes from many sources.
SciDB—an advanced computational database programmable from R and Python—scales out to petabyte volumes and facilitates rapid integration of diverse data sources. Open source and running on commodity hardware, SciDB is extensible and scales cost effectively.
Attend this webinar to learn how quants and system developers harness SciDB’s massively scalable complex analytics to solve hard problems faster. SciDB’s native array storage is optimized for time-series data, delivering fast windowed aggregates and complex analytics, without time-consuming data extraction.
Webinar presenters will demonstrate real world use cases, including the ability to quickly:
1. Generate aggregated order books across multiple exchanges
2. Create adjusted continuous futures contracts
3. Analyze complex financial networks to detect anomalous behavior
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
In recent years, one of the biggest trends in applications development has been the rise of Machine Learning solutions, tools, and managed platforms. Vertex AI is a managed unified ML platform for all your AI workloads. On the MLOps side, Vertex AI Pipelines solutions let you adopt experiment pipelining beyond the classic build, train, eval, and deploy a model. It is engineered for data scientists and data engineers, and it’s a tremendous help for those teams who don’t have DevOps or sysadmin engineers, as infrastructure management overhead has been almost completely eliminated.
Based on practical examples we will demonstrate how Vertex AI Pipelines scores high in terms of developer experience, how fits custom ML needs, and analyze results. It’s a toolset for a fully-fledged machine learning workflow, a sequence of steps in the model development, a deployment cycle, such as data preparation/validation, model training, hyperparameter tuning, model validation, and model deployment. Vertex AI comes with all standard resources plus an ML metadata store, a fully managed feature store, and a fully managed pipelines runner.
Vertex AI Pipelines is a managed serverless toolkit, which means you don't have to fiddle with infrastructure or back-end resources to run workflows.
This session is continuation of “Automated Production Ready ML at Scale” in last Spark AI Summit at Europe. In this session you will learn about how H&M evolves reference architecture covering entire MLOps stack addressing a few common challenges in AI and Machine learning product, like development efficiency, end to end traceability, speed to production, etc.
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...PivotalOpenSourceHub
This talk introduces an open-source solution that integrates cloud native apps running on Cloud Foundry with an open-source hybrid transactions + analytics real-time solution. The architecture is based on the fastest scalable, highly available and fully consistent In-Memory Data Grid (Apache Geode / GemFire), natively integrated to the first open-source massive parallel data warehouse (Greenplum Database) in a hybrid transactional and analytical architecture that is extremely fast, horizontally scalable, highly resilient and open source. This session also features a live demo running on Cloud Foundry, showing a real case of real-time closed-loop analytics and machine learning using the featured solution.
Agile development of data science projects | Part 1 Anubhav Dhiman
Broadly data science encompasses quantitative research, advanced analytics, predictive modelling and machine learning.
How reliably and sustainably can data science team deliver value for organizations?
Data Science Readiness Levels
How to make collaboration easier across organization?
Graph Analytics on Data from Meetup.comKarin Patenge
How to improve your Meetup experience by using Graph Analytics on data from Meetup.com. Slides from my session with "Women Who Code" group in Berlin on May 23, 2018.
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Databricks
GE Aviation has hundreds of data scientists and engineers developing algorithms. The majority of these people do not have the time to learn Apache Spark and continue to develop on local machines in Python or R. We also have lots of historical code that was not developed for Spark. However, the business wanted to deploy to a Spark environment for scalability, as quickly as possible. So how did we bridge the gap? A data scientist and software engineer will co-present to share how we approached the problem of building, unifying and scaling these algorithms.
Learn to Use Databricks for the Full ML LifecycleDatabricks
Machine learning development brings many new complexities beyond the traditional software development lifecycle. Unlike traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. In this talk, learn how to operationalize ML across the full lifecycle with Databricks Machine Learning.
The Monitoring and Metic aspects of Eclipse MicroProfileHeiko Rupp
Slides of my presentation at EclipseCon Europe about Eclipse MicroProfile Metrics and Monitoring aspects.
A video recording of the talk is available at https://youtu.be/Ep4Bkx0_MAg
#GeodeSummit - Modern manufacturing powered by Spring XD and GeodePivotalOpenSourceHub
Wondering how to improve on your production yield, increase asset life and activate reliability centered maintenance? TEKsystems has developed “Golden Batch” recommendation engine to realize your goals of modern manufacturing. This is a Predictive analytics framework built on top of Manufacturing Data Lake for analysis and training of machine learning algorithms, and subsequent processing and detection of streaming data from sensors to detect or predict failures. We’ll present a solution architecture featuring Spring XD for data pipelining, Apache Geode for in-memory processing, Hadoop as a data lake, and R for machine learning.
UKOUG - Implementing Enterprise API Management in the Oracle Cloudluisw19
API-led connectivity has become the main mechanism to integrate with SaaS applications. Mobile applications, modern web applications and Internet of things also need APIs. In the Oracle Cloud there are at least 6 cloud services offering a solution for APIs, (Mobile Cloud Service, API Manager Cloud Service, API Platform Cloud Service, API Catalog Cloud Service, IoT Cloud Service and Integration Cloud Service).
This presentation will first and foremost describe what an enterprise-wide API management solution looks like, will elaborate on a solid API taxonomy to then show how to position each of the mentioned cloud services to deliver an end to end API management solution in the Oracle Cloud but also capable of handling hybrid cloud use cases.
In addition real live use cases will be referenced to help contextualise the content presented.
Sftp Workflows for Data Lakes and Enterprise Applications STG221JonOstrander1
Sharing files using SFTP (Secure Shell File Transfer Protocol) is still important for many businesses, but running your own SFTP servers and infrastructure can burden IT operations. AWS Transfer for SFTP makes it easy to move your file exchange workloads to the cloud. Learn how the service supports common file transfer use cases for data lakes, analytics, and ERP and CRM applications. See a demonstration of key capabilities, including authentication and networking security options, and get your questions answered.
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
As a leading e-commerce company in fashion in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for the customers. Using Spark, the data science team is able to develop various machine-learning projects for this purpose based on the large scale data of products and customers. A major topic for the data science team is ranking products. If a visitor enters a search phrase, what are the best products that fit the search phrase and in what order should the products been shown? Ranking products is also important if a visitor enters a product overview page, where hundreds or even thousands of products of a certain article type are displayed.
In this project, Spark is used in the whole pipeline: retrieving and processing the search phrases and their results, making click models, creating feature sets, training and evaluating ranking models, pushing the models to production using ElasticSearch and creating Tableau dashboarding. In this talk, we are going to demonstrate how we use Spark to build up the whole pipeline of ranking products and the challenges we faced along the way.
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...Flink Forward
New use cases under the Industry 4.0 umbrella are playing a key role in improving factory operations, process optimization, cost reduction and quality improvement. We propose an event streaming architecture to streamline the information flow all the way from the factory to the main data center. Building such a streaming architecture enables a manufacturer to react faster to critical operational events. However, it presents two main challenges:
Data acquisition in real time: data should be collected regardless of its location or access challenges are. It is commonplace to ingest data from hundreds of heterogeneous data sources (ERP, MES, Sensors, maintenance systems, etc).
Event processing in real time: events collected from different parts of the organization should be combined into actionable insights in real time. This is extremely challenging in a context where events can be lost or delayed.
In this talk, we show how Apache NiFi and MiNiFi can be used to collect a wide range of datasources in real-time, connecting the industrial and information worlds. Then, we show how Apache Flink’s unique features enables us to make sense of this data. For instance, we will explain how Flink’s time management such Event Time mode, late arrival handling and watermark mechanism can be used to address the challenge of processing IoT data originating from geographically distributed plants. Finally, we demonstrate an end to end streaming architecture for Industry 4.0 based on the Cloudera DataFlow platform.
At the CodeTalks conference 2017 in Hamburg, LeanIX presented their lessons learned for GraphQL, a new alternative for building REST APIs which was introduced by Facebook.
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofGoDataDriven
During GoDataFest 2019, Rens Weijers, manager data & strategy and Peter van ' t Hof, data engineer, share the story of how Vattenfall develops smart applications on Azure. Vattenfall has the ambition to transition to fossil-free living within one generation. But what about decentral energy solutions in the Customers & Solutions business unit? Data is key to help customers to reduce their CO2 footprint. Azure enables Vattenfall to be personal and relevant towards customers.
Building, managing, and maintaining thousands of features across thousands of models. Building features can be repetitive, tedious and extremely challenging to scale. We will explore the ‘Feature Factory’ built at Databricks and implemented at several clients and the processes that are imperative for the democratization of feature development and deployment. The feature factory allows consumers to ensure repetitive feature creation, simplifies scoring and enables massive scalability through feature multiplication.
Massively Scalable Computational Finance with SciDBParadigm4Inc
Hedge funds, investment managers and prop shops need to keep pace with rapidly growing data volumes from many sources.
SciDB—an advanced computational database programmable from R and Python—scales out to petabyte volumes and facilitates rapid integration of diverse data sources. Open source and running on commodity hardware, SciDB is extensible and scales cost effectively.
Attend this webinar to learn how quants and system developers harness SciDB’s massively scalable complex analytics to solve hard problems faster. SciDB’s native array storage is optimized for time-series data, delivering fast windowed aggregates and complex analytics, without time-consuming data extraction.
Webinar presenters will demonstrate real world use cases, including the ability to quickly:
1. Generate aggregated order books across multiple exchanges
2. Create adjusted continuous futures contracts
3. Analyze complex financial networks to detect anomalous behavior
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
In recent years, one of the biggest trends in applications development has been the rise of Machine Learning solutions, tools, and managed platforms. Vertex AI is a managed unified ML platform for all your AI workloads. On the MLOps side, Vertex AI Pipelines solutions let you adopt experiment pipelining beyond the classic build, train, eval, and deploy a model. It is engineered for data scientists and data engineers, and it’s a tremendous help for those teams who don’t have DevOps or sysadmin engineers, as infrastructure management overhead has been almost completely eliminated.
Based on practical examples we will demonstrate how Vertex AI Pipelines scores high in terms of developer experience, how fits custom ML needs, and analyze results. It’s a toolset for a fully-fledged machine learning workflow, a sequence of steps in the model development, a deployment cycle, such as data preparation/validation, model training, hyperparameter tuning, model validation, and model deployment. Vertex AI comes with all standard resources plus an ML metadata store, a fully managed feature store, and a fully managed pipelines runner.
Vertex AI Pipelines is a managed serverless toolkit, which means you don't have to fiddle with infrastructure or back-end resources to run workflows.
This session is continuation of “Automated Production Ready ML at Scale” in last Spark AI Summit at Europe. In this session you will learn about how H&M evolves reference architecture covering entire MLOps stack addressing a few common challenges in AI and Machine learning product, like development efficiency, end to end traceability, speed to production, etc.
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...PivotalOpenSourceHub
This talk introduces an open-source solution that integrates cloud native apps running on Cloud Foundry with an open-source hybrid transactions + analytics real-time solution. The architecture is based on the fastest scalable, highly available and fully consistent In-Memory Data Grid (Apache Geode / GemFire), natively integrated to the first open-source massive parallel data warehouse (Greenplum Database) in a hybrid transactional and analytical architecture that is extremely fast, horizontally scalable, highly resilient and open source. This session also features a live demo running on Cloud Foundry, showing a real case of real-time closed-loop analytics and machine learning using the featured solution.
Agile development of data science projects | Part 1 Anubhav Dhiman
Broadly data science encompasses quantitative research, advanced analytics, predictive modelling and machine learning.
How reliably and sustainably can data science team deliver value for organizations?
Data Science Readiness Levels
How to make collaboration easier across organization?
Graph Analytics on Data from Meetup.comKarin Patenge
How to improve your Meetup experience by using Graph Analytics on data from Meetup.com. Slides from my session with "Women Who Code" group in Berlin on May 23, 2018.
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Databricks
GE Aviation has hundreds of data scientists and engineers developing algorithms. The majority of these people do not have the time to learn Apache Spark and continue to develop on local machines in Python or R. We also have lots of historical code that was not developed for Spark. However, the business wanted to deploy to a Spark environment for scalability, as quickly as possible. So how did we bridge the gap? A data scientist and software engineer will co-present to share how we approached the problem of building, unifying and scaling these algorithms.
Learn to Use Databricks for the Full ML LifecycleDatabricks
Machine learning development brings many new complexities beyond the traditional software development lifecycle. Unlike traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. In this talk, learn how to operationalize ML across the full lifecycle with Databricks Machine Learning.
The Monitoring and Metic aspects of Eclipse MicroProfileHeiko Rupp
Slides of my presentation at EclipseCon Europe about Eclipse MicroProfile Metrics and Monitoring aspects.
A video recording of the talk is available at https://youtu.be/Ep4Bkx0_MAg
#GeodeSummit - Modern manufacturing powered by Spring XD and GeodePivotalOpenSourceHub
Wondering how to improve on your production yield, increase asset life and activate reliability centered maintenance? TEKsystems has developed “Golden Batch” recommendation engine to realize your goals of modern manufacturing. This is a Predictive analytics framework built on top of Manufacturing Data Lake for analysis and training of machine learning algorithms, and subsequent processing and detection of streaming data from sensors to detect or predict failures. We’ll present a solution architecture featuring Spring XD for data pipelining, Apache Geode for in-memory processing, Hadoop as a data lake, and R for machine learning.
UKOUG - Implementing Enterprise API Management in the Oracle Cloudluisw19
API-led connectivity has become the main mechanism to integrate with SaaS applications. Mobile applications, modern web applications and Internet of things also need APIs. In the Oracle Cloud there are at least 6 cloud services offering a solution for APIs, (Mobile Cloud Service, API Manager Cloud Service, API Platform Cloud Service, API Catalog Cloud Service, IoT Cloud Service and Integration Cloud Service).
This presentation will first and foremost describe what an enterprise-wide API management solution looks like, will elaborate on a solid API taxonomy to then show how to position each of the mentioned cloud services to deliver an end to end API management solution in the Oracle Cloud but also capable of handling hybrid cloud use cases.
In addition real live use cases will be referenced to help contextualise the content presented.
Sftp Workflows for Data Lakes and Enterprise Applications STG221JonOstrander1
Sharing files using SFTP (Secure Shell File Transfer Protocol) is still important for many businesses, but running your own SFTP servers and infrastructure can burden IT operations. AWS Transfer for SFTP makes it easy to move your file exchange workloads to the cloud. Learn how the service supports common file transfer use cases for data lakes, analytics, and ERP and CRM applications. See a demonstration of key capabilities, including authentication and networking security options, and get your questions answered.
CCT is a Web App developed for an IT Project Management exam. This Web App hasbeen developed using various tools including Python, MySQL, Flask, Anaconda, and VisualStudio Code. An app aimed at companies able to manage the transfers of a worker withinthe company, taking into account all the costs that he faces and allowing him to updatethem through this webApp. Finally, to make understanding easier, there was a graphicalinterface dedicated to understanding costs, transfers through plots.
CCT is a project for the IT Project and Management exam's. It's an help for the employee to plan and organize the transfer and receive a refound in a short term. The principal operations are : Add transfer, Share on Telegram, View Details, Import Costs in way to semplify the PM's work. For develop this web app, I use Anaconda, Flask, Visual Studio and MySql.
CCT is a web app developed to help the project manager to have an overview of the transfers made by his team. It is a web app developed entirely in Python, HTML, css. I also used Flask to connect to the server.
CCT allowed to: add new transfers, show charts related to the types of costs and value, produce a PDF document to download, automatically calculate the sum of the costs made and look for new users on Github to cover missing skills.
Rapid Web Development with Python for Absolute BeginnersFatih Karatana
This slide covers Python basics, Python key features, Web development basis, RESTful architecture key points, Agile Web Development, Python web framework basis and fundamentals.
Using AWS to design and build your data architecture has never been easier to gain insights and uncover new opportunities to scale and grow your business. Join this workshop to learn how you can gain insights at scale with the right big data applications.
Using AWS to design and build your data architecture has never been easier to gain insights and uncover new opportunities to scale and grow your business. Join this workshop to learn how you can gain insights at scale with the right big data applications.
We'll look at how to architect and build a serverless platform and what makes something "serverless". We will dive into the design patterns for serverless applications and how container management solutions must be architected around user requirements.
We will dive deep into how existing cloud-based serverless platforms leverage containers, how they're scheduled, managed, and sandboxed. We'll also look at what improvements we might expect or desire of new and existing serverless platforms.
OSMC 2011 | Neues von Icinga by Icinga TeamNETWAYS
Zum Zeitpunkt der Konferenz ist Icinga mit der Version 1.6 bei der vierten und letzten Version für dieses Jahr angekommen. Ein Beispiel für die gesteigerte Usability sind die integrierten Modulanbindungen für PNP4Nagios und dem BP-Addon aber auch die vielen neuen Features von Icinga-Classic und Icinga-Web. Neben einer kurzen Zusammenfassung der vergangenen Änderungen wird der Vortrag die neue Icinga-Version und deren Möglichkeiten präsentieren. Neben dem neuen API-Konzept liegt der Schwerpunkt auf Servicemonitoring und Availibility-Analyse innerhalb des Webinterfaces und durch die neue integrierte Jasper-Reporting-Anbindung. Darüber hinaus können die Besucher des Vortrags einen ersten Einblick in die angekündigten API- und Distributionkomponenten erwarten.
Building event-driven (Micro)Services with Apache KafkaGuido Schmutz
This talk begins with a short recap of how we created systems over the past 20 years, up to the current idea of building systems, using a Microservices architecture. What is a Microservices architecture and how does it differ from a Service-Oriented Architecture? Should you use traditional REST APIs to integrate services with each eachother in a Microservices Architecture? Or is it better to use a more loosely-coupled protocol? Answers to these and many other questions are provided. The talk will show how a distributed log (event hub) can help to create a central, persistent history of events and what benefits we achieve from doing so. Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk shows the difference between a request-driven and event-driven communication and answers when to use which. It highlights how a modern stream processing systems can be used to hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
Data Ingestion in Big Data and IoT platformsGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Similar to Documentation and Deployment through Python Libraries (20)
Fundamental analysis primarily comprises of analyzing the company from the long term perspective by looking at its various incomes and profit generating capacity and also by looking at the various ratios of profitability, operations etc. On the contrary, a new analysis technique for companies, called as Technical analysis, deals with making short term profits based on the recent trends and market movements. It facilitates the trader in identifying the entry and exit points in trade.
International Conference | Artificial Intelligence & Machine LearningRishabh Garg
International Conference on Artificial Intelligence and Machine Learning | 23-24 July 2022 | Toronto, Canada.
The Conference aims to provide a platform to academia as well as industry to share cutting-edge development in the fields of Artificial Intelligence and Machine Learning. Authors are solicited to contribute their articles that illustrate research results, projects, surveying works and industrial experiences.
The presentation provides an overview of two-layer machine learning model that can classify the type of biomolecules present in the medium (in the first layer) and predict the concentration of the material (in the second layer). Bacteria have been used as the known biological material using Electrical Impedance Spectroscopy (EIS Data).
Python Library using impedance processingRishabh Garg
The present method consists of using impedance.py python library for fitting the circuit directly to the lab data. Accuracy metrics are yet to be improvised by adjusting the circuit model.
International Webinar - Global ID Through BlockchainRishabh Garg
Cumbrous documentation, unsolicited expenses, undue involvement of intermediaries, and frequent data hacks, are some of the major roadblocks that deprive millions of individuals from having an official identity. The present talk aims to introduce a DLT enabled All-inclusive ID - 'Self Sovereign Identity' to ensure organized and sustainable change at Global level.
International Talk on Technical AnalysisRishabh Garg
What exactly are the various types of prices that we associate with a stock, or in other words, what are the variables that comprise the movement of stock? These queries were addressed to, by Rishabh Garg - a core member of WSC, in his lecture on Technical Analysis on January 01, 2022 at 12:00:00 PM (IST | UTC+5:30).
Complete process of Assessment and Accreditation of Higher Education Institutions in India. The applicant HEIs are expected to be aware of all requirements and to submit all required information. Applicants are encouraged to be conversant on related topics before launching the application form.
An all-inclusive procedure of Assessment & Accreditation of Higher Education Institutions, including Universities, Autonomous, Affiliated and Constituent Colleges (all Government institutions, Grant-in-aid colleges or Self-financed institutes) in India.
It explains step wise process of Registration, Online submission of IIQA (Institutional Information for Quality Assessment); SSR (Self-Study Report); DVV (Data Validation and Verification); SSS (Student Satisfaction Survey); PTV (Peer Team Visit); and Institutional Grading.
The word clone has been extensively used to indicate the product of recombinant DNA technology that allows geneticist to create identical copies of a DNA fragment, more often alluded to as gene. In practice, the procedure is carried out by inserting a fragment of desired DNA into another DNA molecule, a vector, and allowing this chimeric molecule to replicate inside a fast replicating living cell such as bacterium.
Multi purpose ID : A Digital Identity to 134 Crore IndiansRishabh Garg
Multipurpose ID is a combination of a Techno Smart Card carrying a twenty-digit universal identification number to record all purposeful information of an individual and a touch screen Smart Cell Phone for electronic surveillance. Both the units can work separately or together. Such a unique system would replace all possible documents procured by an individual during his life time.
Apart from saving human resources, time, money and administrative complexities, the stack of files and papers in offices would also be reduced to fractional level. No Xerox, no documentation, no verification and no long queues for day-to-day pursuits. Just go for one click and the entire details of an individual would be available, that too fully genuine.
The Nation would have a red letter day as the change will shape billion lives and bring respite to administrative machinery and public that has crumpled under the red tapism.
Techno Smart Card : Digital ID for Every IndianRishabh Garg
Digital ID with Electronic Surveillance System is a combination of Multipurpose ID Card, carrying a twenty-digit unique identification number to record the entire life time data of a citizen, and a Smart Mobile Phone for Electronic Surveillance. Both the units can work separately or together. Such a unique Techno-Smart Device would replace all possible documents - Birth Certificates, Aadhar, Passport, Driving License, PAN, Insurance, Bank Account Numbers ................
Thus, the present innovation would make the life of every individual on Earth free from redundant documentation and would serve as a rescue from practice of forged identity, deception and corruption.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
Documentation and Deployment through Python Libraries
1. SWIGGY SOFTWARE DEVELOPMENT BANGALORE
EFFICIENT DOCUMENTATION AND DEPLOYMENT
THROUGH PYTHON LIBRARIES
RISHABH GARG
BITS-PILANI | GOA
PRACTICE SCHOOL
2. RISHABH GARG
BITS-PILANI | GOA
ABOUT SWIGGY
• Established in 2014 by two alumni of BITS Pilani.
Based on a hyperlocal on-demand food delivery business operation.
• Now serves 300+ cities across India
• Business Segments include Swiggy Access, Swiggy Super, Swiggy Pop,
Swiggy Daily, Swiggy Stores and Swiggy Go. Also expanded to provided
beverage services.
• Recently raised $800 million from various investors.
• Current Valuation is at $5 billion.
• Valuation now exceeds that of Zomato which provides similar services.
3. RISHABH GARG
BITS-PILANI | GOA
ABOUT PROJECT
The developer documents were required to be categorized and ported to a web view
documentation website for better search functionality of the required commands and
representation of the document hierarchy (TOC tree in Sphinx) which seemed to be
cluttered in the case of Shuttle docs.
The first step consisted of scrapping the code from the Shuttle docs using beautiful
soup python library which required login through Confluence ticket and cookies for
security purposes. After getting the HTML code of the documents, html2rest and
pandoc python libraries were used for automated containerization of html docs and
eventual conversion to RST files.
Then sphinx python library was used to create the boilerplate code and TOC structure
of the base documents, on which the hence converted RST files were linked. format.
After conversion, a shell script was written to automate the running of commands that
were used in the above process. The documents were then deployed to Amazon S3.
5. RISHABH GARG
BITS-PILANI | GOA
PROJECT DETAILS
Easier search functionality and management of services
Better documentation of services and access through web view platforms like
Sphinx
Conversion of HTML documents into RST files for rendering through Sphinx
Automation of commands used for installation of required python libraries and
conversion into RST documents.
Static deployment to Amazon S3 through Shuttle and UAT accounts
EXPECTED PROJECT OUTCOMES
6. RISHABH GARG
BITS-PILANI | GOA
PROJECT DETAILS
Since there are multiple tasks to be done for production and deployment of any
service like writing the infra in bitbucket, creating env variables etc. we can create
an app.yaml file that stores all the information related to the configuration of system
from metadata to cl setups.
We can also create business alerts which can be manually migrated and created
using coast. To conclude, Shuttle makes the deployment process easier by
committing to the codebase instead of consul.
Sphinx uses RST (Restructured Text) files for rendering content internally unlike the
conventional HTML, CSS and Vanilla JavaScript framework.
MORE ABOUT SHUTTLE AND SPHINX
7. RISHABH GARG
BITS-PILANI | GOA
PROJECT DETAILS
It has a hierarchical structure which enables
easy definition of a document tree, with
automatic links to siblings, parents and
children. Code handling can be done
automatically using Pygment highlighter.
MORE ABOUT SHUTTLE AND SPHINX
8. RISHABH GARG
BITS-PILANI | GOA
PROJECT DETAILS
A shell script is a computer program
designed to be run by the Unix shell, a
command-line interpreter. The various
dialects of shell scripts are considered to be
scripting languages. Typical operations
performed by shell scripts include file
manipulation, program execution, and
printing text.
MORE ABOUT SHELL SCRIPTS AND
AMAZON S3
9. RISHABH GARG
BITS-PILANI | GOA
PROJECT DETAILS
In this project, after conversion, a shell
script was written to automate the running
of commands that were used in the
conversion process. It consisted of a simple
.sh file that contained some basic if-else
statements and regex expressions for
checking the file types and moving them to
required TOC folders.
MORE ABOUT SHELL SCRIPTS AND
AMAZON S3
10. RISHABH GARG
BITS-PILANI | GOA
PROJECT DETAILS
Amazon S3 or Amazon Simple Storage
Service is a service offered by Amazon
Web Services that provides object storage
through a web service interface. Amazon
S3 uses the same scalable storage
infrastructure that Amazon.com uses to run
its global e-commerce network.
MORE ABOUT SHELL SCRIPTS AND
AMAZON S3
11. RISHABH GARG
BITS-PILANI | GOA
PROJECTS MADE
Markdown rendered RST
. file for service handling
03
Shell script for automation
of commands
02
Sphinx-documented
website made using
RST files
01
12. RISHABH GARG
BITS-PILANI | GOA
WORK DONE
Week 5
Used python libraries to
scrape the documentation
websites of the company.
Copied the docstrings into
markdown and converted
them to RST files
Week 3
Met with the reporting
manager and industry
mentor. Started learning
about Shuttle and Sphinx
from the material
recommended by the
mentor.
Week 4
Continued learning about
the tools scheduled a
meeting with mentor
Week 2
Learnt about Software
Engineering
Week 1
Read about the business
model of Swiggy, the
technological solutions it
provides and its newly
provided services in various
domains
Week 7<
Sent the DVO and SHUTTL_
tickets for approval on Jira to
create sandbox and Shuttle
accounts respectively for
deployment to Amazon S3.
Week 6
Built a shell script of all
the commands that are
to be required for
conversion of html docs
into RST files. Studied
about DVO and Shuttle
tickets for Amazon S3
static deployment
13. RISHABH GARG
BITS-PILANI | GOA
ACHIEVEMENTS
PROJECTS MILESTONES
WEEK 2
Learnt about Sphinx,
GitBook & DataBricks
WEEK 4
HTML converted to
markdown files and
RST files.
WEEK 6
Amazon S3
credentials received
and bucket created
WEEK 3
HTML code scrapper
through beautiful
soup python library
WEEK 5
Built the shell script
for automation of
commands.
1 7