CraftCamp for Students - Introduction to gitcraftworkz
CraftCamp for Students - Introduction to git
Git is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. Git was initially designed and developed by Linus Torvalds for Linux kernel development in 2005, and has since become the most widely adopted version control system for software development.
Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.
CraftCamp for Students - Introduction to gitcraftworkz
CraftCamp for Students - Introduction to git
Git is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. Git was initially designed and developed by Linus Torvalds for Linux kernel development in 2005, and has since become the most widely adopted version control system for software development.
Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.
Using Containers for Continuous Integration and Continuous DeliveryCarlos Sanchez
Building and testing is a great use case for containers, both due to the dynamic and isolation aspects, but it increases complexity when scaling to multiple nodes and clusters.
However, the Kubernetes project provides a container orchestration solution that greatly simplifies app deployments in large clusters, and allows to dynamically run any containerized workload. Jenkins is an example of an application that can take advantage of such technology to run Continuous Integration and Continuous Delivery workloads.
The Jenkins Kubernetes plugin can transparently use on demand containers to run build agents and jobs, and isolate job execution. It also supports CI/CD-as-code using Jenkins Pipelines.
The presentation will allow a better understanding of Kubernetes, and how to use Jenkins on Kubernetes for container based large scale, showing also the challenges of running distributed applications (particularly JVM apps).
GLV OnAir Ottobre 2019
In questa introduzione a GitHub Actions: vedremo gli elementi base, cosa è possibile fare, cosa invece si rivela complicato o impossibile da fare, come trovare informazioni ed esempi.
Introduction to Project atomic (CentOS Dojo Bangalore)Lalatendu Mohanty
The talk was given in CentOS Dojo Bangalore on 29th April 2015
http://wiki.centos.org/Events/Dojo/Bangalore2015
This slides contains introduction to Project Atomic and CentOS Atomic SIG.
Composer is the de-facto php dependency management tool of the future. An ever-increasing number of useful open-source libraries are available for easy use via Packagist, the standard repository manager for Composer. As more and more Drupal contrib modules begin to depend on external libraries from Packagist, the motivation to use Composer to manage grows stronger; since Drupal 8 Core, and Drush 7 are now also using Composer to manage dependencies, the best way to ensure that all of the requirements are resolved correctly is to manage everything from a top-level project composer.json file.
This deck examines the different ways that Composer can be used to manage your project code, and how these new practices will influence how you use Drush and deploy code.
Watch the session video: https://www.youtube.com/watch?v=WNS3d_wzZ2Y
Introducing containers into your infrastructure brings new capabilities, but also new challenges, in particular around configuration. This talk will take a look under the hood at some of those operational challenges including:
* The difference between runtime and build-time configuration, and the importance of relating the two together.
* Configuration drift, immutable mental models and mutable container file systems.
* Who configures the orchestrators?
* Emergent vs. model driven configuration.
In the process we will identify some common problems and talk about potential solutions.
Talk from PuppetConf 2016
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQErica Windisch
Introduction to using Docker for dev, testing, and deployment. Covering best practices for image building, to advice for simple and complicated CI configurations, through to orchestrating and running images in production.
Using Containers for Continuous Integration and Continuous DeliveryCarlos Sanchez
Building and testing is a great use case for containers, both due to the dynamic and isolation aspects, but it increases complexity when scaling to multiple nodes and clusters.
However, the Kubernetes project provides a container orchestration solution that greatly simplifies app deployments in large clusters, and allows to dynamically run any containerized workload. Jenkins is an example of an application that can take advantage of such technology to run Continuous Integration and Continuous Delivery workloads.
The Jenkins Kubernetes plugin can transparently use on demand containers to run build agents and jobs, and isolate job execution. It also supports CI/CD-as-code using Jenkins Pipelines.
The presentation will allow a better understanding of Kubernetes, and how to use Jenkins on Kubernetes for container based large scale, showing also the challenges of running distributed applications (particularly JVM apps).
GLV OnAir Ottobre 2019
In questa introduzione a GitHub Actions: vedremo gli elementi base, cosa è possibile fare, cosa invece si rivela complicato o impossibile da fare, come trovare informazioni ed esempi.
Introduction to Project atomic (CentOS Dojo Bangalore)Lalatendu Mohanty
The talk was given in CentOS Dojo Bangalore on 29th April 2015
http://wiki.centos.org/Events/Dojo/Bangalore2015
This slides contains introduction to Project Atomic and CentOS Atomic SIG.
Composer is the de-facto php dependency management tool of the future. An ever-increasing number of useful open-source libraries are available for easy use via Packagist, the standard repository manager for Composer. As more and more Drupal contrib modules begin to depend on external libraries from Packagist, the motivation to use Composer to manage grows stronger; since Drupal 8 Core, and Drush 7 are now also using Composer to manage dependencies, the best way to ensure that all of the requirements are resolved correctly is to manage everything from a top-level project composer.json file.
This deck examines the different ways that Composer can be used to manage your project code, and how these new practices will influence how you use Drush and deploy code.
Watch the session video: https://www.youtube.com/watch?v=WNS3d_wzZ2Y
Introducing containers into your infrastructure brings new capabilities, but also new challenges, in particular around configuration. This talk will take a look under the hood at some of those operational challenges including:
* The difference between runtime and build-time configuration, and the importance of relating the two together.
* Configuration drift, immutable mental models and mutable container file systems.
* Who configures the orchestrators?
* Emergent vs. model driven configuration.
In the process we will identify some common problems and talk about potential solutions.
Talk from PuppetConf 2016
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQErica Windisch
Introduction to using Docker for dev, testing, and deployment. Covering best practices for image building, to advice for simple and complicated CI configurations, through to orchestrating and running images in production.
A tutorial presentation based on hadoop.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
The title "Big Data using Hadoop.pdf" suggests that the document is likely a PDF file that focuses on the utilization of Hadoop technology in the context of Big Data. Hadoop is a popular open-source framework for distributed storage and processing of large datasets. The document is expected to cover various aspects of working with big data, emphasizing the role of Hadoop in managing and analyzing vast amounts of information.
Lean Drupal Repositories with Composer and DrushPantheon
Composer is the industry-standard PHP dependency manager that is now in use in Drupal 8 core. This session will show the current best practices for using Composer, drupal-composer, drupal-scaffold, Drush, Drupal Console and Drush site-local aliases to streamline your Drupal 7 and Drupal 8 site repositories for optimal use on teams.
Hadoop installation on windows using virtual box and also hadoop installation on ubuntu
http://logicallearn2.blogspot.in/2018/01/hadoop-installation-on-ubuntu.html
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Chetan Khatri
What is Data Science?
What is Machine Learning, Deep Learning, and AI?
Motivation
Philosophy of Artificial Intelligence (AI)
Role of AI in Daily life
Use cases/Applications
Tools & Technologies
Challenges: Bias, Fake Content, Digital Psychography, Security
Detect Fake Content with “AI”
Learning Path
Career Path
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Chetan Khatri
Pragmatic presentation on Penetration testing for Data-Driven Platforms.
Agenda:
- Motivation
- Information Security - Ethics.
- Encryption
- Authentication
- Information Security & Potential threats with Open Source World.
- Find vulnerabilities.
- Checklist before using any Open Source library.
- Vulnerabilities report.
- Penetration Testing for Data-Driven Developments.
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionChetan Khatri
Scala Toronto July 2019 event at 500px.
Pure Functional API Integration
Apache Spark Internals tuning
Performance tuning
Query execution plan optimisation
Cats Effects for switching execution model runtime.
Discovery / experience with Monix, Scala Future.
No more struggles with Apache Spark workloads in productionChetan Khatri
Paris Scala Group Event May 2019, No more struggles with Apache Spark workloads in production.
Apache Spark
Primary data structures (RDD, DataSet, Dataframe)
Pragmatic explanation - executors, cores, containers, stage, job, a task in Spark.
Parallel read from JDBC: Challenges and best practices.
Bulk Load API vs JDBC write
An optimization strategy for Joins: SortMergeJoin vs BroadcastHashJoin
Avoid unnecessary shuffle
Alternative to spark default sort
Why dropDuplicates() doesn’t result consistency, What is alternative
Optimize Spark stage generation plan
Predicate pushdown with partitioning and bucketing
Why not to use Scala Concurrent ‘Future’ explicitly!
No more struggles with Apache Spark (PySpark) workloads in production, Chetan Khatri, Data Science Practice Leader.
Accionlabs India. PyconLT’19, May 26 - Vilnius Lithuania
Fossasia ai-ml technologies and application for product development-chetan kh...Chetan Khatri
Train at GPU and Inference at Mobile, Artificial Intelligence / Machine learning Technologies and Applications for AI Driven Product Development. Talk at FOSSASIA 2018, Singapore
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
1. Data Science Bootcamp, Day 2
Presented By:
Chetan Khatri, Volunteer Teaching assistant, Data Science Lab.
Guidance By:
Prof. Devji D. Chhanga, University of Kachchh.
2. Agenda
Understanding Git.
Understanding Apache Maven.
Hello World Java Program with Apache Maven.
Understanding of Hadoop Administrative Commands.
WordCount Hadoop Program on Hadoop Cluster with Maven.
3. Git with Github
● Github: Repository storage where you can store your source code and share
with team member work interactively.
● Installation: sudo apt-get install git
● Steps TODO:
1. Create Repository
2. Clone - Copy someone else's repository
3. Commit - Ready to submit your code to repository.
4. Let’s have Demo with Git
● Create Repository at Github named hadoopdemo
● Cloning Repository: git clone https://github.com/dskskv/hadoopdemo.git
● Configure github with your credentials:
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
Add individual file: sudo git add README.md
for adding every files: sudo git add .
5. Let’s have Demo with Git (Conti…)
commit command - sudo git commit -m "Comment anything"
Submit request to github repository with whatever has been added:
sudo git origin master
pull - is to get latest updated code from repository
Example : git pull https://github.com/dskskv/hadoopdemo.git
Git Branches:
Are Different Modules of the Repository, Such as Development, Test, Production phase of the
software development.
Master branch has always updated code.
6. Understanding Apache Maven
Apache Maven is Build Tool for Java, where you can use Other Artifacts(Jar files
written by someone else) and build your Jar file which contains all other’s
added before.
Maven Life Cycle:
Create Maven Project
Update Maven Project
Write Java Code
Maven Clean
Maven Build (For building your Jar file)
7. Understanding Hadoop Administrative Commands
1. Cloning github cccs936 repository
git clone https://github.com/dskskv/CCCS936.git
2. Start Hadoop Cluster
sbin/start-dfs.sh
sbin/start-yarn.sh
3. Check Hadoop Version
hadoop version
4. Check all the options under hadoop command
hadoop
5. Create Directory as "dskskv" at HDFS
hadoop fs -mkdir /dskskv
8. Understanding Hadoop Administrative Commands
6. List out the contents of dskskv object inside HDFS
hadoop fs -ls /dskskv
7. Create Text file
sudo gedit inputfile.txt
8. Put text file inside HDFS block
hadoop fs -put inputfile.txt /dskskv
9. Read the content of HDFS textfile object
hadoop fs -cat /dskskv/inputfile.txt
9. Understanding Hadoop Administrative Commands
10. hadoop deprecated, use hdfs also for the same operations.
hdfs dfs -mkdir /chetan
hdfs dfs -put inputfile.txt /chetan
hdfs dfs -cat /chetan/inputfile.txt
11. Deleting file from HDFS
hadoop fs -rm /dskskv/inputfile.txt
12. Deleting Directory from HDFS
hadoop fs -rm -r /dskskv
10. WordCount Hadoop Program on Hadoop Cluster with
Maven
1) Login as a Hadoop User:
su hduser
2) Start hadoop deamon services
sbin/start-dfs.sh
sbin/start-yarn.sh
3) Check whether all deamon services are up or not
jps
4) Create directory in HDFS, Note: make sure wherever you are in the console , Hadoop user should
have previlegies to access it.
hadoop fs -mkdir /input
5) Transfer textfile to HDFS
hadoop fs -put inputfile.txt /input
11. WordCount Hadoop Program on Hadoop Cluster with
Maven
6) Check whether file is transferred successfully
hadoop fs -ls /input
7) execute hadoop job by providing Hadoop Program executable Jar file and input directory path where
text file is there and output directory path where you are looking to store process data.
hadoop jar WordCountDSKSKV-0.0.1-SNAPSHOT.jar /input /output
8) Check Processed Directory has processed files ?
hadoop fs -ls /output
9) Read your desired output from Hadoop Job.
hadoop fs -cat /output/part-r-00000