Moven is a framework for distributing machine learning models using Maven. It addresses the problems of managing model distribution and versioning as well as automating testing. Models are published as Maven artifacts and can then be retrieved and used from Java and Python. Moven provides model-agnostic publishing and retrieval of compressed models through Maven repositories and dependencies. It is currently being used in production at Redlink and the SSIX project.
Building Applications with Streams and SnapshotsJ On The Beach
Stream processing has been traditionally associated with realtime analytics. Modern stream processors, like Apache Flink, however, go far beyond that and give us a new approach to build applications and services as a whole.
This talk shows how to build applications on *data streams*, *state*, and *snaphots* (point-in-time views of application state) using Apache Flink. Rather than separating computation (application) and state (database), Flink manages the application logic and state as a tight pair and uses snapshots for consistent view onto the application and its state. With features like Flink's queryable state, the stream processor and database effectively become one.
This application pattern has many interesting properties: Aside from having fewer moving parts, it supports very high event rates because of its tight integration between computation and state, and its simple concurrency and recovery model. At the same time, it exposes a powerful consistency model, allows for seamless forking/updating/rollback of online applications, generalizes across historic and real-time data, and easily incorporates event time semantics and handling of late data. Finally, it allows applications to be defined in an easy way via streaming SQL.
Realizing the promise of portability with Apache BeamJ On The Beach
The world of big data involves an ever changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam (incubating) aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms.
In this talk, I will:
Cover briefly the capabilities of the Beam model for data processing and integration with IOs, as well as the current state of the Beam ecosystem.
Discuss the benefits Beam provides regarding portability and ease-of-use.
Demo the same Beam pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Flink on Google Cloud, Apache Spark on AWS, Apache Apex on-premise).
Give a glimpse at some of the challenges Beam aims to address in the future.
AI is a scoring machine to automate data processing to get an inference.
Quality of the data defines the quality of the AI
Quality of feature engineering either does
No domain knowledge - no AI
Data Scientist works on data and models, to build an application Engineers and DevOps are required
In the real world a model will encounter a case it was never trained for. Continuous Retraining is a must.
Hydrosphere.io Platform for AI/ML Operations AutomationRustem Zakiev
Simple and robust ML models deployment
Automated versioning
Easy models and versions management
Score the model from your app or microservice via REST, gRPC or Kafka stream API.
A/B and Canary testing on production traffic.
Hot-wing bumpless model replacement in production pipeline
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflowDatabricks
As Atlassian continues to scale to more and more customers, the demand for our legendary support continues to grow. Atlassian needs to maintain balance between the staffing levels needed to service this increasing support ticket volume with the budgetary constraints needed to keep the business healthy – automated ticket volume forecasting is at the centre of this delicate balance
Building Applications with Streams and SnapshotsJ On The Beach
Stream processing has been traditionally associated with realtime analytics. Modern stream processors, like Apache Flink, however, go far beyond that and give us a new approach to build applications and services as a whole.
This talk shows how to build applications on *data streams*, *state*, and *snaphots* (point-in-time views of application state) using Apache Flink. Rather than separating computation (application) and state (database), Flink manages the application logic and state as a tight pair and uses snapshots for consistent view onto the application and its state. With features like Flink's queryable state, the stream processor and database effectively become one.
This application pattern has many interesting properties: Aside from having fewer moving parts, it supports very high event rates because of its tight integration between computation and state, and its simple concurrency and recovery model. At the same time, it exposes a powerful consistency model, allows for seamless forking/updating/rollback of online applications, generalizes across historic and real-time data, and easily incorporates event time semantics and handling of late data. Finally, it allows applications to be defined in an easy way via streaming SQL.
Realizing the promise of portability with Apache BeamJ On The Beach
The world of big data involves an ever changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam (incubating) aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms.
In this talk, I will:
Cover briefly the capabilities of the Beam model for data processing and integration with IOs, as well as the current state of the Beam ecosystem.
Discuss the benefits Beam provides regarding portability and ease-of-use.
Demo the same Beam pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Flink on Google Cloud, Apache Spark on AWS, Apache Apex on-premise).
Give a glimpse at some of the challenges Beam aims to address in the future.
AI is a scoring machine to automate data processing to get an inference.
Quality of the data defines the quality of the AI
Quality of feature engineering either does
No domain knowledge - no AI
Data Scientist works on data and models, to build an application Engineers and DevOps are required
In the real world a model will encounter a case it was never trained for. Continuous Retraining is a must.
Hydrosphere.io Platform for AI/ML Operations AutomationRustem Zakiev
Simple and robust ML models deployment
Automated versioning
Easy models and versions management
Score the model from your app or microservice via REST, gRPC or Kafka stream API.
A/B and Canary testing on production traffic.
Hot-wing bumpless model replacement in production pipeline
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflowDatabricks
As Atlassian continues to scale to more and more customers, the demand for our legendary support continues to grow. Atlassian needs to maintain balance between the staffing levels needed to service this increasing support ticket volume with the budgetary constraints needed to keep the business healthy – automated ticket volume forecasting is at the centre of this delicate balance
In the session, we discussed the End-to-end working of Apache Airflow that mainly focused on "Why What and How" factors. It includes the DAG creation/implementation, Architecture, pros & cons. It also includes how the DAG is created for scheduling the Job and what all steps are required to create the DAG using python script & finally with the working demo.
Quick introduction about Apache Spark and how it fits in the cognitive world, how can we use it to help cognitive solutions as well as create distributed algorithms to predict and perform other machine learning tasks.
Presentation given at Coolblue B.V. demonstrating Apache Airflow (incubating), what we learned from the underlying design principles and how an implementation of these principles reduce the amount of ETL effort. Why choose Airflow? Because it makes your engineering life easier, more people can contribute to how data flows through the organization, so that you can spend more time applying your brain to more difficult problems like Machine Learning, Deep Learning and higher level analysis.
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...Databricks
The ‘feature store’ is an emerging concept in data architecture that is motivated by the challenge of productionizing ML applications. The rapid iteration in experimental, data driven research applications creates new challenges for data management and application deployment.
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
This is the slide I presented at PyCon SG 2019. I talked about overview of Airflow and how we can use Airflow and the other data engineering services on AWS and GCP to build data pipelines.
In the big data world, it's not always easy for Python users to move huge amounts of data around. Apache Arrow defines a common format for data interchange, while Arrow Flight introduced in version 0.11.0, provides a means to move that data efficiently between systems. Arrow Flight is a framework for Arrow-based messaging built with gRPC. It enables data microservices where clients can produce and consume streams of Arrow data to share it over the wire. In this session, I'll give a brief overview of Arrow Flight from a Python perspective, and show that it's easy to build high performance connections when systems can talk Arrow. I'll also cover some ongoing work in using Arrow Flight to connect PySpark with TensorFlow - two systems with great Python APIs but very different underlying internal data.
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...Khai Tran
Computation convergence problems, auto-generate Beam API code from Pig scripts, convergences at LinkedIn with AORA (Author Once Run Anywhere) principle.
Blog post:
https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite
Code example:
Pig script: https://gist.github.com/khaitranq/1d06c27832f15fa52a4a7e2fa7bec340
Beam autogen code: https://gist.github.com/khaitranq/785dbb8495cd382788f3ca8200231d8
Slide deck for the fourth data engineering lunch, presented by guest speaker Will Angel. It covered the topic of using Airflow for data engineering. Airflow is a scheduling tool for managing data pipelines.
Productionizing Machine Learning Pipelines with Databricks and Azure MLDatabricks
Deployment of modern machine learning applications can require a significant amount of time, resources, and experience to design and implement – thus introducing overhead for small-scale machine learning projects.
Presentation given on the 15th July 2021 at the Airflow Summit 2021
Conference website: https://airflowsummit.org/sessions/2021/clearing-airflow-obstructions/
Recording: https://www.crowdcast.io/e/airflowsummit2021/40
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAnimesh Singh
Model Inferencing use cases are becoming a requirement for models moving into the next phase of production deployments. More and more users are now encountering use cases around canary deployments, scale-to-zero or serverless characteristics. And then there are also advanced use cases coming around model explainability, including A/B tests, ensemble models, multi-armed bandits, etc.
In this talk, the speakers are going to detail how to handle these use cases using Kubeflow Serving and the native Kubernetes stack which is Istio and Knative. Knative and Istio help with autoscaling, scale-to-zero, canary deployments to be implemented, and scenarios where traffic is optimized to the best performing models. This can be combined with KNative eventing, Istio observability stack, KFServing Transformer to handle pre/post-processing and payload logging which consequentially can enable drift and outlier detection to be deployed. We will demonstrate where currently KFServing is, and where it's heading towards.
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Databricks
The freedom of fast iterations of distributed deep learning tasks is crucial for smaller companies to gain competitive advantages and market shares from big tech giants. Horovod Runner brings this process to relatively accessible spark clusters.
Overview of Apache Spark 2.3: What’s New? with Sameer AgarwalDatabricks
Apache Spark 2.0 set the architectural foundations of Structure in Spark, Unified high-level APIs, Structured Streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community contributors have continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.
Continuing forward in that spirit, Apache Spark 2.3 has made similar strides too, introducing new features and resolving over 1300 JIRA issues. In this talk, we want to share with the community some salient aspects of Spark 2.3 features:
Kubernetes Scheduler Backend
PySpark Performance and Enhancements
Continuous Structured Streaming Processing
DataSource v2 APIs
Spark History Server Performance Enhancements
Deep dive into Kubeflow Pipelines, and details about Tekton backend implementation for KFP, including compiler, logging, artifacts and lineage tracking
In the session, we discussed the End-to-end working of Apache Airflow that mainly focused on "Why What and How" factors. It includes the DAG creation/implementation, Architecture, pros & cons. It also includes how the DAG is created for scheduling the Job and what all steps are required to create the DAG using python script & finally with the working demo.
Quick introduction about Apache Spark and how it fits in the cognitive world, how can we use it to help cognitive solutions as well as create distributed algorithms to predict and perform other machine learning tasks.
Presentation given at Coolblue B.V. demonstrating Apache Airflow (incubating), what we learned from the underlying design principles and how an implementation of these principles reduce the amount of ETL effort. Why choose Airflow? Because it makes your engineering life easier, more people can contribute to how data flows through the organization, so that you can spend more time applying your brain to more difficult problems like Machine Learning, Deep Learning and higher level analysis.
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...Databricks
The ‘feature store’ is an emerging concept in data architecture that is motivated by the challenge of productionizing ML applications. The rapid iteration in experimental, data driven research applications creates new challenges for data management and application deployment.
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
This is the slide I presented at PyCon SG 2019. I talked about overview of Airflow and how we can use Airflow and the other data engineering services on AWS and GCP to build data pipelines.
In the big data world, it's not always easy for Python users to move huge amounts of data around. Apache Arrow defines a common format for data interchange, while Arrow Flight introduced in version 0.11.0, provides a means to move that data efficiently between systems. Arrow Flight is a framework for Arrow-based messaging built with gRPC. It enables data microservices where clients can produce and consume streams of Arrow data to share it over the wire. In this session, I'll give a brief overview of Arrow Flight from a Python perspective, and show that it's easy to build high performance connections when systems can talk Arrow. I'll also cover some ongoing work in using Arrow Flight to connect PySpark with TensorFlow - two systems with great Python APIs but very different underlying internal data.
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...Khai Tran
Computation convergence problems, auto-generate Beam API code from Pig scripts, convergences at LinkedIn with AORA (Author Once Run Anywhere) principle.
Blog post:
https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite
Code example:
Pig script: https://gist.github.com/khaitranq/1d06c27832f15fa52a4a7e2fa7bec340
Beam autogen code: https://gist.github.com/khaitranq/785dbb8495cd382788f3ca8200231d8
Slide deck for the fourth data engineering lunch, presented by guest speaker Will Angel. It covered the topic of using Airflow for data engineering. Airflow is a scheduling tool for managing data pipelines.
Productionizing Machine Learning Pipelines with Databricks and Azure MLDatabricks
Deployment of modern machine learning applications can require a significant amount of time, resources, and experience to design and implement – thus introducing overhead for small-scale machine learning projects.
Presentation given on the 15th July 2021 at the Airflow Summit 2021
Conference website: https://airflowsummit.org/sessions/2021/clearing-airflow-obstructions/
Recording: https://www.crowdcast.io/e/airflowsummit2021/40
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAnimesh Singh
Model Inferencing use cases are becoming a requirement for models moving into the next phase of production deployments. More and more users are now encountering use cases around canary deployments, scale-to-zero or serverless characteristics. And then there are also advanced use cases coming around model explainability, including A/B tests, ensemble models, multi-armed bandits, etc.
In this talk, the speakers are going to detail how to handle these use cases using Kubeflow Serving and the native Kubernetes stack which is Istio and Knative. Knative and Istio help with autoscaling, scale-to-zero, canary deployments to be implemented, and scenarios where traffic is optimized to the best performing models. This can be combined with KNative eventing, Istio observability stack, KFServing Transformer to handle pre/post-processing and payload logging which consequentially can enable drift and outlier detection to be deployed. We will demonstrate where currently KFServing is, and where it's heading towards.
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Databricks
The freedom of fast iterations of distributed deep learning tasks is crucial for smaller companies to gain competitive advantages and market shares from big tech giants. Horovod Runner brings this process to relatively accessible spark clusters.
Overview of Apache Spark 2.3: What’s New? with Sameer AgarwalDatabricks
Apache Spark 2.0 set the architectural foundations of Structure in Spark, Unified high-level APIs, Structured Streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community contributors have continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.
Continuing forward in that spirit, Apache Spark 2.3 has made similar strides too, introducing new features and resolving over 1300 JIRA issues. In this talk, we want to share with the community some salient aspects of Spark 2.3 features:
Kubernetes Scheduler Backend
PySpark Performance and Enhancements
Continuous Structured Streaming Processing
DataSource v2 APIs
Spark History Server Performance Enhancements
Deep dive into Kubeflow Pipelines, and details about Tekton backend implementation for KFP, including compiler, logging, artifacts and lineage tracking
Sedes de internado para realizar las prácticas pre profesionales por parte de los alumnos de la Universidad Inca Garcilaso de la Vega Facultad de Psicología.
A panel of XPages experts - Mike McGarel, David Leedy, and Nathan Freeman - each give a short presentation, then answer XPages questions from attendees. For the recording, please visit: https://youtu.be/jBaRSM9Ng_o
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware
Presentation title: Model and pilot all cloud layers with OCCIware, from IoT to Big Data
Abstract: Who uses multi cloud today ? Everybody. Alas, this leads to a lot of "technical glue". Enter OCCIware's Studio and Runtime : manage all layers and domains of the Cloud (XaaS) in a uniform, standard, extensible way - the Cloud consumer platform.presentation.
This talk presents how the OCCIware Studio - currently being contributed to the Eclipse Foundation by Inria and Obeo - takes advantage of Eclipse Modeling and SIrius in order to support a metamodel for the generic Open Cloud Computing Interface (OCCI) REST API and build a "studio factory", while providing feedback and lessons learned on various other Eclipse components.
It concludes on a live demonstration of using it to model and pilot an IoT (nodeMCU/ESP8266), Linked & Big Data (JSON-LD, Spark), containerized Cloud solution to let electricity consumption be monitored across territories by all actors - individuals, utility providers, up to regional public bodies.
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Marc Dutoo
Who uses multi cloud today ? Everybody. Alas, this leads to a lot of "technical glue". Enter OCCIware's Studio and Runtime : manage all layers and domains of the Cloud (XaaS) in a uniform, standard, extensible way - the Cloud consumer platform.
This talk presents how the OCCIware Studio - currently being contributed to the Eclipse Foundation by Inria and Obeo - takes advantage of Eclipse Modeling and SIrius in order to support a metamodel for the generic Open Cloud Computing Interface (OCCI) REST API and build a "studio factory", while providing feedback and lessons learned on various other Eclipse components.
It concludes on a live demonstration of using it to model and pilot an IoT (nodeMCU/ESP8266), Linked & Big Data (JSON-LD, Spark), containerized Cloud solution to let electricity consumption be monitored across territories by all actors - individuals, utility providers, up to regional public bodies.
Microservices and containers for the unitiatedKevin Lee
In this presentation I provide a high level explanation of why applications are now being developed using in a Microservice architecture. I look at how Microservice applications are typically developed and deployed using container technology and look at some of the challenges of using container technology for applications in production.
Check out the talk to the slides:
http://bit.ly/1ReY8uJ
Talk Abstract:
Using Swarm, you can select “just enough app server” to support each of your microservices.
In this session, we’ll outline how WildFly Swarm works and get you started writing your first microservices using Java EE technologies you’re already familiar with.
You’ll learn how to setup your build system (Maven, Gradle, or your IDE of choice) to run and test WildFly Swarm-based services and produce runnable jars. We will walk from the simple case of wrapping a normal WAR application to the more advanced case of configuring the container using your own main(…) method.
Choosing PaaS: Cisco and Open Source Options: an overviewCisco DevNet
A session in the DevNet Zone at Cisco Live, Berlin. Confused by all the open source PaaS options out there? What criteria should you use to evaluate them? We seek to answer these questions in a systematic manner and will explore top technologies such as Mesos, Apprenda, Cloud Foundry and Kubernetes along with Cisco's Project Shipped and open source Mantl. The aim of this session will be to shed light on which platforms add value to your needs, applications and workloads.
This presentation discusses how to achieve continuous delivery, leveraging on docker containers, here used as universal application artifacts. It has been presented at Voxxed '15 Bucharest.
To facilitate a variety of usage scenarios and gradually scale to larger number of users, Galaxy supports deployment on systems ranging from a laptop to a supercomputer to clouds. In this talk, real-world examples of two different models for harnessing a variety of resources will be presented: (1) a centralized Galaxy utilizing a set of geographically distributed resources in support of a large user base, and (2) a model of easily deploying multiple standalone instances of Galaxy to support high resource demands or customizations by a smaller groups. Together, these models showcase the capacity of Galaxy to support a variety of usage scenarios and a variable number of users with a variety of needs.
This talk covered the OpenStack basics that VMware Administrators need to be aware of to be successful in their deployments. We also had the Tesora team join us on stage to discuss the importance of Database-as-a-Service with the Trove project!
Similar to Moven - Apache Big Data Europe 2016 - SSIX Project (20)
Trends in software architecture: a professional (des)orientationSergio Fernández
Invited talk to the Software Architecture course at the Bachelor Degree in Computer Science of the University of Oviedo. Further details at https://ingenieriainformatica.uniovi.es/actualidad/eventos/-/asset_publisher/uS6D/content/deteccion-de-sesgo-algoritmico-1
Redfine aims to extend OpenRefine to make use of the Redlink APIs for a more advance and performance solution for publishing Linked Data.
A proposal by Redlink GmbH to the Fusepool Open Call for Developers at the Data|Hack|Award 2014.
Development Infrastructure for MICO, Media in Context FP7 Project, presented in the project kick-off meeting, Salzburg , Nov 14, 2013.
Further details at http://www.mico-project.eu
Barra libre en proyectos de software... pero sólo hasta media noche Sergio Fernández
Ponencia invitada para la asigantura de Dirección y gestión de proyectos web de 2º del Máster en Ingeniería Web de la Universidad de Oviedo.
Oviedo, 26 de octubre de 2011.
TRIOO, Keeping the Semantics of Data Safe and Sound into Object-Oriented Soft...Sergio Fernández
Talk of the paper with the same title at the 5th International Conference on Software and Data Technologies (ICSOFT 2010), Athens, Greece, July 22-24, 2010.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
4. Models in SSIX and Redlink
In Redlink, particularly in the SSIX project, we deal with quite
deep neural networks that produce very large models (several
Gigabytes).
Therefore we thought how to address two problems:
1. How to properly manage its distribution and versioning?
2. How to automatizate its testing?
5. Moven
moven = models + maven
https://bitbucket.org/ssix-project/moven
The thesis why we started to work on Moven was the lack of proper state-of-the-art
technology for addressing the two needs described before (distributing and testability).
Some examples:
● TensorFlow public models use a regular git repository
● In Spark ML most of the people use a shared storage (e.g., HDFS)
● OpenNLP also bundle them as JARs
● Freeling uses a share folder from the native installation packages
● Some other proprietary methods...
As Maven does a great work for software artifact, we decided to reuse that
infrastructure for models too.
6. Moven key features
● Model agnostic
● Publication based on a regular Maven plugin
● Distribution relying on the existing Maven infrastructure
○ benefiting of all the features provided by existing tooling (access control,
mirroring, etc)
● Retrieval current supported in:
○ Java (Maven of course)
○ Python (relying on jip)
● Built-in gzip-based compression
7. Related work
There are some interesting work related with our goals:
● StandordNLP has recently (>3.5.2) changed to bundle the modules as Maven
artifacts
● TensorFlow Serving helps to deploy new algorithms for TensorFlow models
● PipelineIO combines several technologies (Spark, NetflixOSS, etc; they call it the
PANCAKE STACK) to provide models distribution, including incremental training,
among many other features (more details).
9. Using your Moven models
From Java:
● Declare the dependency to your
models at your pom.xml
● Then models will be available in
your classpath:
● Also exposed via HTTP as static
resources when the JAR is deployed
in any Servlet >=3.0 container
(inspired by James Ward and the
WebJars project).
From Python:
● Install it: pip install moven
● Declare at models.txt your
models in your project (as we do
with requirements.txt) with a
syntax similar to Groovy's Grape:
● Execute moven models.txt to
retrieve all models to ./moven
organized by artifactId.
● Thought-out specifically for
container deployments
this.getClass().getClassLoader()
.getResourceAsStream("META-INF/resources/models/foo.ex")
io.redlink.ssix.moven:moven-syntaxnet-example:1.0-SNAPSHOT
11. Current status and future
Moven is still in a very early stage, but already being used in
production in SSIX and other Redlink projects.
We will keep exploring such approaches to find a way to better
manage the lifecycle of the models that drive our information
extraction (Natural Language Processing, Machine Learning, Deep
Learning, etc) stack.
For example, we want to target more specific needs in some
concrete environments, such as Apache Spark and/or Apache
Beam Runners API.