Brief introduction to GNU/Linux OS. Introduction of basic concepts, commands, And a few tips for administration tasks.
With the context of a course in Data Science: Applications to Biology and Medicine with Python and R (2020 edition). Postgraduate course at University of Barcelona.
Biology, medicine, physics, astrophysics, chemistry: all these scientific domains need to process large amount of data with more and more complex software systems. For achieving reproducible science, there are several challenges ahead involving multidisciplinary collaboration and socio-technical innovation with software at the center of the problem. Despite the availability of data and code, several studies report that the same data analyzed with different software can lead to different results. I am seeing this problem as a manifestation of deep software variability: many factors (operating system, third-party libraries, versions, workloads, compile-time options and flags, etc.) themselves subject to variability can alter the results, up to the point it can dramatically change the conclusions of some scientific studies. In this keynote, I argue that deep software variability is a threat and also an opportunity for reproducible science. I first outline some works about (deep) software variability, reporting on preliminary evidence of complex interactions between variability layers. I then link the ongoing works on variability modelling and deep software variability in the quest for reproducible science.
Biology, medicine, physics, astrophysics, chemistry: all these scientific domains need to process large amount of data with more and more complex software systems. For achieving reproducible science, there are several challenges ahead involving multidisciplinary collaboration and socio-technical innovation with software at the center of the problem. Despite the availability of data and code, several studies report that the same data analyzed with different software can lead to different results. I am seeing this problem as a manifestation of deep software variability: many factors (operating system, third-party libraries, versions, workloads, compile-time options and flags, etc.) themselves subject to variability can alter the results, up to the point it can dramatically change the conclusions of some scientific studies. In this keynote, I argue that deep software variability is a threat and also an opportunity for reproducible science. I first outline some works about (deep) software variability, reporting on preliminary evidence of complex interactions between variability layers. I then link the ongoing works on variability modelling and deep software variability in the quest for reproducible science.
This slides are prepared to introduce the public on the IT Technology which has gain a lot of attention by either small and big companies. It is not only gain attention but also being used by big companies such as Google, Twitter, Facebook and Amazon. The technology is called Free Software or also known as Open Source Software. The concept behind this technology is SHARING. Through sharing, This concept has been here nearly 40 years ago. Internet is one the examples that use this technology. The main concept is about FREEDOM.
Introduction to GNU/Linux, Free/Libre Open Source Software, comparing the OS with Mac OSX and Microsoft Windows, and a few other infos and pointers.
Content partially reused from a Masters Degree at VHIR where I had to tech that introduction prior to using GNU/Linux tools for Bioinformatics in that master of traslational medicine: https://www.vhir.org
The OpenChain Project has launched a series of bi-weekly free webinars that provide access to people and knowledge that we would otherwise obtain at events. We held our fifth meeting on Monday the 1st of June at 9am Pacific with two guest speakers.
This time we explored Software Heritage, an initiative whose goal is to collect, preserve, and share software code, and continued our discussion of containers from the perspective of scalable compliance.
Roberto Di Cosmo, Director at Software Heritage, explained why this initiative collects and preserves software in source code form with the understanding that software embodies key technical and scientific knowledge that humanity cannot afford to risk losing. His presentation helped provide insight into how such initiatives can link into activities like compliance automation in open source compliance, an area of immediate interest to the OpenChain community.
Michael Weber - Rechenkraft.net - From Volunteers to ScientistsCitizenCyberlab
Michael Weber presenting Rechenkraft.net - From Volunteers to Scientists, at the Citizen Cyberlab Summit, 17-18 September 2015, University of Geneva (UNIGE).
Lawrence berkeley national laboratory sep 2015 - Jupyter Talk
Scientific facilities are increasingly generating large data sets. Next-generation scientific productivity relies on user-friendly tools and efficient, effective and seamless access to resources and data. Traditional approaches to research and software development for science focus on the hardware and software of the machine and do not consider the user. In this talk, I will highlight a different approach to building software for scientific users by including user knowledge in the process. I will illustrate a few example projects where this has been used to date.
GIthub repository: https://github.com/Carreau/talks/tree/master/labtech-2015
Digital Tools' First session covers the following topics:
Types of software (according to OS, purpose or license)
Types of file formats
Licenses
Free Libre Open Source Software (FLOSS)
Keeping your files and folders in order
"SEMINAR: Análisis de Big data con Tidyverse y Spark: uso en estadística pública"
By Xavier de Pedro Puente, Ph.D.
Senior Technician at the Barcelona City Council.
Wednesday, March 29th, 2023. 16h-19:00h + questions
Within the context of the postgraduate course on
"Data Science. Applications to Biology and Medicine with Python and R"
at University of Barcelona (IL3). 2023.
Taller Allibera el teu ordinador amb Linux en catalaXavier de Pedro
DIMECRES 25/05/2022 a les 18:30h - CÀPSULA FORMATIVA: ALLIBERA EL TEU ORDINADOR
Casal de Barri Can Carol
http://cancarolvallcarca.cat
Aquesta càpsula tindrà una part introductòria on es repassaran els principals problemes que hi sol haver en emprar ordinadors amb sistemes operatius privatius de
llibertats. Es mostraran, pas a pas, formes d’instal·lar un sistema operatiu lliure, i en català, actualitzat (i actualitzable fàcilment per xarxa de forma periòdica).
Durant la sessió es farà una instal·lació d’un sistema operatiu lliure en un ordinador al costat del sistema operatiu que ja hi té instal·lat (sense malmetre’l). També es mostrarà un altre ordinador similar ja enllestit.
Es recomana portar l’ordinador propi.
This slides are prepared to introduce the public on the IT Technology which has gain a lot of attention by either small and big companies. It is not only gain attention but also being used by big companies such as Google, Twitter, Facebook and Amazon. The technology is called Free Software or also known as Open Source Software. The concept behind this technology is SHARING. Through sharing, This concept has been here nearly 40 years ago. Internet is one the examples that use this technology. The main concept is about FREEDOM.
Introduction to GNU/Linux, Free/Libre Open Source Software, comparing the OS with Mac OSX and Microsoft Windows, and a few other infos and pointers.
Content partially reused from a Masters Degree at VHIR where I had to tech that introduction prior to using GNU/Linux tools for Bioinformatics in that master of traslational medicine: https://www.vhir.org
The OpenChain Project has launched a series of bi-weekly free webinars that provide access to people and knowledge that we would otherwise obtain at events. We held our fifth meeting on Monday the 1st of June at 9am Pacific with two guest speakers.
This time we explored Software Heritage, an initiative whose goal is to collect, preserve, and share software code, and continued our discussion of containers from the perspective of scalable compliance.
Roberto Di Cosmo, Director at Software Heritage, explained why this initiative collects and preserves software in source code form with the understanding that software embodies key technical and scientific knowledge that humanity cannot afford to risk losing. His presentation helped provide insight into how such initiatives can link into activities like compliance automation in open source compliance, an area of immediate interest to the OpenChain community.
Michael Weber - Rechenkraft.net - From Volunteers to ScientistsCitizenCyberlab
Michael Weber presenting Rechenkraft.net - From Volunteers to Scientists, at the Citizen Cyberlab Summit, 17-18 September 2015, University of Geneva (UNIGE).
Lawrence berkeley national laboratory sep 2015 - Jupyter Talk
Scientific facilities are increasingly generating large data sets. Next-generation scientific productivity relies on user-friendly tools and efficient, effective and seamless access to resources and data. Traditional approaches to research and software development for science focus on the hardware and software of the machine and do not consider the user. In this talk, I will highlight a different approach to building software for scientific users by including user knowledge in the process. I will illustrate a few example projects where this has been used to date.
GIthub repository: https://github.com/Carreau/talks/tree/master/labtech-2015
Digital Tools' First session covers the following topics:
Types of software (according to OS, purpose or license)
Types of file formats
Licenses
Free Libre Open Source Software (FLOSS)
Keeping your files and folders in order
"SEMINAR: Análisis de Big data con Tidyverse y Spark: uso en estadística pública"
By Xavier de Pedro Puente, Ph.D.
Senior Technician at the Barcelona City Council.
Wednesday, March 29th, 2023. 16h-19:00h + questions
Within the context of the postgraduate course on
"Data Science. Applications to Biology and Medicine with Python and R"
at University of Barcelona (IL3). 2023.
Taller Allibera el teu ordinador amb Linux en catalaXavier de Pedro
DIMECRES 25/05/2022 a les 18:30h - CÀPSULA FORMATIVA: ALLIBERA EL TEU ORDINADOR
Casal de Barri Can Carol
http://cancarolvallcarca.cat
Aquesta càpsula tindrà una part introductòria on es repassaran els principals problemes que hi sol haver en emprar ordinadors amb sistemes operatius privatius de
llibertats. Es mostraran, pas a pas, formes d’instal·lar un sistema operatiu lliure, i en català, actualitzat (i actualitzable fàcilment per xarxa de forma periòdica).
Durant la sessió es farà una instal·lació d’un sistema operatiu lliure en un ordinador al costat del sistema operatiu que ja hi té instal·lat (sense malmetre’l). També es mostrarà un altre ordinador similar ja enllestit.
Es recomana portar l’ordinador propi.
Seminario "Análisis de Big Data con Tidyverse y Spark: uso en estadística pública". Dentro de curso de postgrado: "Data Science. Applications to Biology and Medicine with Python and R". Universidad de Barcelona. 2020
Sesión sobre GNU Linux - Introduction and Administration. Dentro de curso de postgrado "Data Science. Aplicaciones a la Biologia y a la Medicina con Python y R". Universidad de Barcelona. 2020
Challenges and opportunities in Artificial IntelligenceXavier de Pedro
Short Talk in a Workshop to discuss use of Artificial Intelligence among cities, universities and other partners for potential projects in this area. Held at the Barcelona Supercomputing Center.
Enhance your Team Work with Distributed Version Control Systems - DVCSXavier de Pedro
Distributed revision control takes a peer-to-peer approach to version control, as opposed to the client-server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the co<x>debase is a complete repository. Distributed revision control synchronizes repositories by exchanging patches (sets of changes) from peer to peer. This results in some important differences from a centralized system:
No canonical, reference copy of the co<x>debase exists by default; only working copies.
Common operations (such as commits, viewing history, and reverting changes) are fast, because there is no need to communicate with a central server.
Communication is only necessary when sharing changes among other peers.
Each working copy effectively functions as a remote backup of the co<x>debase and of its change-history, protecting against data loss.
Other differences include:
Multiple "central" repositories.
Code from disparate repositories are merged based on a web of trust, i.e., historical merit or quality of changes.
Numerous different development models are possible, such as development / release branches or a Commander / Lieutenant model, allowing for efficient delegation of topical developments in very large projects.[3] Lieutenants are project members who have the power to dynamically decide which branches to merge.
Network is not involved for common operations.
A separate set of "sync" operations are available for committing or receiving changes with remote repositories.
DVCS proponents point to several advantages of distributed version control systems over the traditional centralised model:
Allows users to work productively when not connected to a network.
Makes most operations much faster.
Allows participation in projects without requiring permissions from project authorities, and thus arguably better fosters culture of meritocracy instead of requiring "committer" status.
Allows private work, so users can use their changes even for early drafts they do not want to publish.
Avoids relying on one physical machine as a single point of failure.
Permits centralized control of the "release version" of the project
On FLOSS software projects it is much easier to create a project fork from a project that is stalled because of leadership conflicts or design disagreements.
Enhance your Team Work with Distributed Version Control Systems - DVCSXavier de Pedro
Distributed revision control takes a peer-to-peer approach to version control, as opposed to the client-server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the co<x>debase is a complete repository. Distributed revision control synchronizes repositories by exchanging patches (sets of changes) from peer to peer. This results in some important differences from a centralized system:
No canonical, reference copy of the co<x>debase exists by default; only working copies.
Common operations (such as commits, viewing history, and reverting changes) are fast, because there is no need to communicate with a central server.
Communication is only necessary when sharing changes among other peers.
Each working copy effectively functions as a remote backup of the co<x>debase and of its change-history, protecting against data loss.
Other differences include:
Multiple "central" repositories.
Code from disparate repositories are merged based on a web of trust, i.e., historical merit or quality of changes.
Numerous different development models are possible, such as development / release branches or a Commander / Lieutenant model, allowing for efficient delegation of topical developments in very large projects.[3] Lieutenants are project members who have the power to dynamically decide which branches to merge.
Network is not involved for common operations.
A separate set of "sync" operations are available for committing or receiving changes with remote repositories.
DVCS proponents point to several advantages of distributed version control systems over the traditional centralised model:
Allows users to work productively when not connected to a network.
Makes most operations much faster.
Allows participation in projects without requiring permissions from project authorities, and thus arguably better fosters culture of meritocracy instead of requiring "committer" status.
Allows private work, so users can use their changes even for early drafts they do not want to publish.
Avoids relying on one physical machine as a single point of failure.
Permits centralized control of the "release version" of the project
On FLOSS software projects it is much easier to create a project fork from a project that is stalled because of leadership conflicts or design disagreements.
150511 programari lliure_i_taller_de_linux_v2Xavier de Pedro
Seminari sobre Programari Lliure i GNU/Linux, amb taller guiat per a instal·lar-te Ubuntu al teu ordinador de forma dual amb el teu altre sistema operatiu.
Hack-tivisme amb eines i continguts alliberats: beneficia-te'n!Xavier de Pedro
Xerrada al Fòrum Social Català 2014. http://2014.forumsocialcatala.cat/item11
En el taller es reflexionarà sobre els avantatges d'emprar coneixement lliure i alliberar el coneixement generat per nosaltres. S'explicarà com s'ha de fer per a garantir que les generacions futures puguin emprar i continuar millorant el coneixement que hem generat de forma individual o col·lectiva.
En el taller separarem el coneixement en tres tipus:
1. informació en si mateixa, com poden ser documents escrits, plànols de construcció d'objectes/cases/màquines, o també vídeos, documentals, música o altres documents en formats àudio (contes explicats per a nens, etc).
2. programes informàtics (d'ordinador, tauleta, telèfon intel·ligent, però també del cotxe, electrodomèstics, drons, etc)
3. eines físiques (impressores 3D, telèfons, tractors, tallers per fabricar maons per a la construcció, forns, etc)
El tipus de llicència de copyright escollida és important, i determina que haguem protegit legalment que altres persones puguin emprar i modificar, si volem, el nostre coneixement, i també ens protegeix de que altres persones no pugui apoderar-se en exclusiva del nostre coneixement sense demanar-nos permís previ per arribar a algun acord que ens satisfaci.
Podem posar un exemple, en el cas dels programes informàtics. La majoria de persones empren els programes que venien amb l'ordinador, i els que “els hi han passat” amics o familiars per a instal·lar-se. En la majoria de casos això vol dir emprar i piratejar programari (software) propietari: MS Windows, MS Office, Photoshop, Ilustrator, ... Això és il·legal i poc ètic, i ens fa deixar d'aprofitar la fantàstica oportunitat d'emprar programari lliure per a les mateixes tasques. Aquest programari és diu lliure per que porta associades unes llibertats legals del que pots fer amb el programa. L'objectiu d'aquest taller és introduir la reflexió sobre quin programari solem emprar per tradició i comoditat, quina alternativa tenim, i les conseqüències positives que té per a nosaltres, i per a la Comunitat, d'escollir programari lliure per davant del programari propietari sempre que n'existeixin alternatives prou madures com ja existeixen en molts àmbits.
Al taller s'exposaran exemples reals de l'àmbit de la documentació, elaboració de plànols de edificacions, construcció del necessari per a sostenir una civilització, fotos i vídeos (quins sí i quins no), programes d'ordinador/tauleta i telèfon mòbil, i de quina manera ens en podem beneficiar de forma individual i col·lectiva del seu ús.
Hack-tivisme amb continguts i eines alliberades: beneficia-te'n!Xavier de Pedro
Xerrada sobre com l'ètica hacker ens pot ajudar en el nostre activisme per un món millor si emprem continguts lliures (amb copyleft, llicències de copyright de tipus "Creative Commons") i eines lliures, com són el programari lliure (eines immaterials) i el maquinari lliure (eines materials). S'indica també com ens poden beneficiar d'aquestes pràctiques, tant de forma individual com col·lectivament.
Xerrada feta el 5 d'Abril de 2014 emmarcada dins les Jornades de Cultura Lliure al Centre Civic Can Basté de Nou Barris de Barcelona (Catalunya, Espanya).
Més informació (i diapositives amb enllaços clicables, etc):
http://llavorspac.org/CanBaste
V Jornadas de Software Libre - UPC: TikiWiki en contextos educativos (I) y (II)Xavier de Pedro
V Jornades de Programari Lliure
ETSEIB (UPC), del 5 al 8 de juliol de 2006
http://jornadespl.org
Aquesta obra està sota una Llicència de Creative Commons Atribució - CompartirPerIgual 2.5. Espanya.
De Pedro, X. y Reyes, J. 2006a. “TikiWiki en contextos educativos (I): las comunidades abiertas de aprendizaje cooperativo y reflexivo”. V Jornadas de Software Libre, Universidad Politécnica de Cataluña. Texto completo: http://www.ub.edu/gclub/dl52 (530 Kb).
De Pedro, X. y Reyes, J. 2006b. “TikiWiki en contextos educativos (II): los sistemas de evaluación de los aprendizajes”. V Jornadas de Software Libre, Universidad Politécnica de Cataluña. Texto completo: http://www.ub.edu/gclub/dl53 (698 Kb)
II jornadas de Usuarios R: Usando de forma segura R vía web con TikiXavier de Pedro
Usando de forma segura R vía web con Tiki.
Xavier de Pedro*, Àlex Sanchez
Departamento de Estadística
Universitdad de Barcelona
http://estbioinfo.stat.ub.es
Xavier.dePedro@ub.edu
1) Nuestras necesidades
2) GUI's Web para R
3) Tiki y el nuevo PluginR
4) Ejemplos y casos de uso
5) Trabajo futuro
A menudo el personal docente e investigador (PDI) de los centros de investigación utiliza software propietario para la gestión de su contenido en páginas web y facilitar la colaboración con otros (departamento, grupo de investigación, proyecto, colegas extranjeros...). Se suelen buscar aplicaciones del estilo Web 2.0, por lo que si antes se solía usar herramientas como bscw, o MS Sharepoint, ahora no es raro ver el uso de Google sites, Google Docs, Wikispaces, .... El software usado no suele ser lo bastante completo o versátil como para permitir configurar el escenario de uso deseado según las necesidades que se tienen en cada caso. Y por tanto, es frecuente que se acaben usando multitud de programas diferentes, con sus respectivos nombres de usuario y contraseña, para todas y cada una de sus necesidades de publicación y colaboración on-line (noticias, archivos, foros, wiki, hojas de cálculo, trackers o bases de datos, ...). En muchos casos, además, se suele perder el control de los datos al estar en empresas externas que imponen sus propias condiciones de uso.
Actualmente existen diversas aplicaciones web 2.0 de software libre que, en teoría, podrían facilitar dichos escenarios de uso (Tiki, Plone, Drupal, Joomla, Twiki, ...). En esta comunicación se aborda el caso de la recientemente liberada versión 3.0 de Tiki (Tikiwiki CMS/Groupware - http://tikiwiki.org). Se hará un análisis de las múltiples prestaciones de qué dispone ya funcionales en cuanto se instala la aplicación [motor wiki potente y translingüístico (CLWE), edición visual (Wysiwyg) o rápida (Wiki), foros, blogs, trackers, folksonomia / marcas libres, mapas y GIS libre, web semántica, múltiples plugins ya instalados para funcionalidades avanzadas, webservices, sistema granular de permisos y categorización jerárquica de contenido, búsqueda sensible a permisos de lectura...]. Por otra parte, se citarán casos concretos de uso en universidades catalanas, y se ilustrará con una breve demostración de instalación y configuración a la estética de la Universidad de Barcelona en unos pocos clics.
Casos de uso a gran escala:
* Sitio internacional de soporte de Mozilla Firefox (http://support.mozilla.com),
* Wiki de KDE (http://wiki.kde.org).
La comunicación está destinada a alumnado y profesorado que no conozca prácticamente Tiki 3.0, y esté buscando un sistema unificado y actualizado de herramientas web para facilitar la colaboración con sus colegas sin recurrir a software propietario ni a multitud de aplicaciones sueltas.
Colaboración entre PDI (2): Gestión bibliográfica con BibusXavier de Pedro
A menudo el personal docente e investigador (PDI) de los centros de investigación utiliza software propietario para la gestión de sus referencias bibliográficas (Endnote, Reference Manager, Procite ...). Durante mucho tiempo no han existido alternativas libres multiplataforma que tuvieran prestaciones similares, pero desde hace algunos años que Bibus se ha ido estableciendo como solución cada vez más estable y usable tanto sobre GNU/Linux, MS Windows o Mac. Además, cada vez más se están promocionando servicios web para compartir bibliografía con herramientas de software de fuentes cerradas, incluso a través de las instituciones públicas españolas (EndnoteWeb, RefWorks, ...), como si fuera la única manera posible de facilitar que el PDI pueda mantener su propia base de datos individual o compartida on-line de publicaciones y reusar las citas bibliográficas en sus artículos.
Así, en esta comunicación se abordan brevemente algunas ventajas e inconvenientes del uso de Bibus para gestionar la bibliografía personal o de un grupo de investigación, a partir de la experiencia concreta del autor de la comunicación. Se mostrará un ejemplo concreto de uso de Bibus a partir de la exportación del CV desde la aplicación Curricul@ (del conjunto de aplicaciones WebGREC que usan la mayoría de universidades catalanas para la gestión del las publicaciones de su PDI). Asímismo se citarán brevemente algunas de las alternativas actuales libres, en comparación con sus análogas propietarias.
La comunicación está destinada a alumnado y profesorado que no conozca prácticamente Bibus aún, y que use alternativas propietarias para sus tareas de gestión de referencias y creación de la bibliografía de sus publicaciones en el formato deseado.
Colaboración entre PDI (1): Estadística y Gráficos Científicos con R.Xavier de Pedro
A menudo el personal docente e investigador (PDI) de los centros de investigación utiliza software propietario para sus modelos, cálculos estadísticos y visualización de datos científicos (Matlab, Mathematica, Statistica, SPSS...). A medio o largo plazo a menudo se acaba buscando la colaboración con otros docentes e investgadores en areas de trabajo afines, o el alumnado necesita realizar cálculos estadísticos y gráficos fuera del aula o una vez acabada la carrera, lo que suele suponer que se tengan que comprar licencias de software (¿o se pirateen, a veces?) para poder usarlo. Una de las alternativas libres con enormes ventajas a medio y largo plazo, así como con un enorme potencial y crecimiento en la última década ha sido el software R (http://www.r-project.org).
Así, en esta comunicación se abordan brevemente algunas ventajas e inconvenientes del uso de R para facilitar la colaboración entre PDI (actual y futuro), a partir de la experiencia concreta del autor de la comunicación. Se muestran algunas aplicaciones gráficas para interaccionar con R, y se citan algunas de las alternativas actuales libres. La comunicaicón está destinada a alumnado y profesorado que no conozca practicamente R aún, y que use alternativas propietarias para sus tareas de investigación y docencia.
Más en http://www.ub.edu/gclub/jornadespl2009
0903 Resum del Miquel sobre les Jornades Internacionals sobre Organismes Modi...Xavier de Pedro
6 i 7 de Març de 2009. Resum de les Jornades Internacionals sobre Organismes Modificats Genèticament. A la residència d'Investigadors del CSIC, a Barcelona.
Ponents:
Dr. Marcello Buiatti, Catedràtic de Genètica de la Universitat de Florència (Itàlia), “a
quí beneficien els ogm?”
Dr. Brian John, Doctor en Geografia, "GM Free Cymru" (Escòcia lliure de transgènics).
Dr. Armin Spök, Doctor en Genètica Molecular. Professor a les universitats de Graz i Klagenfurt (Àustria). Membre dels grups d'experts de l'Autoritat Europea de Seguretat Alimentària (EFSA) i de l'OCDE ., “què ha fet Àustria en aquest tema”.
Dr. Henk Hobbelink, Grain,”OMG,
crisi alimentària i canvi climàtic”.
Fabio Boscareli, Toscana lliure de transgènics, “les regions lliures i un cas concret”.
Dr. Giles-Eric Séralini, Catedràtic de Biologia Molecular de la Universitat de Caen
(França), President del Consell Científic de CRIIGEN, “els OGM i la salud”.
Dra. Ricarda A. Steinbrecher, “els impactes dels OGM a l'agriculttura i al medi ambient”
Taula rodona.
Presentació del Curs d'Introducció al Programari de Fonts Obertes (Software de Fuentes Abiertas), organitzat per la CENATIC (http://www.cenatic.es), la Conferència de Rectors de les Universitats
Espanyoles (CRUE) i el gclUB (http://gclub.ub.es) a la Universitat de Barcelona, el 6 de Març de 2009
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
1. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
Data Science
(ciencia de los datos):
Aplicaciones a la biología y a la
medicina con Python y R
(2ª edición 2020)
Curso de Experto Universitario
Universidad de Barcelona (UB)
Con el soporte de:https://www.ub.edu
2. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
2
GNU/Linux OS
Introduction
Administration
Xavier de Pedro, PhD.
July 6, 2020
Course «Data Science» UB
Image source: http://tekopsglobal.com
3. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
3
Xavier de Pedro Puente, Ph.D.
xdepedro@bcn.cat
Academics:
Degree in Biology
(University of Barcelona - UB)
Ph.D. in Ecology
(University of Barcelona - UB)
Postgraduate in Bioinformatics
(Open University of Catalonia – UOC)
Current Work:
Senior technician at Municipal Data Office
(Barcelona City Council)
Past (related) Work:
Bioinformatics technician (UEB, VHIR)
Systems administrator (UEB, VHIR)
4. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
4
(1) GNU/Linux Introduction
– Concepts, History, "Distributions”,
Ubuntu
– Basic Differences compared with:
●
MS Windows
●
Mac OSX
– Command line runs
– Aliases
Session Outline
(2) GNU/Linux Administration
– Installation
– Package Management
– User Management
– Permission Management
– Device Management
– Backup Management
– Security
– Computer Client Management
5. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
5
Tool: Virtual Machine (VM) with Lubuntu GNU/Linux
See: https://www.virtualbox.org
2 options:
(1) Import Virtual Box VM (.ova file) locally
(2) Connect to remote VM with program X2Go.
Option 1 Option 2
6. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
6
Option 1: Import VirtualBox VM (.ova file)
●
Download from:
– 64 bit computers: (5.7 Gb)
http://cloud.seeds4c.org/lubuntu_1804_64bit_v02.ova
– 32 bit computers - if any: (7.3 Gb)
http://cloud.seeds4c.org/lubuntu_1604_32bit_v05.ova
See: https://www.virtualbox.org
1
7. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
7
Option 1: VirtualBox VM with Lubuntu GNU/Linux
See: https://www.virtualbox.org
●
Import VM locally
– Follow instructions
around slide 47, in
linux admin section:
“Importing VirtualBox
(.OVA)”
8. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
8
Option 2: Connect to remote VM with X2Go
See: https://wiki.x2go.org/doku.php/doc:installation:x2goclient
●
Install X2Go Client:
https://wiki.x2go.org
●
Add new session prefs:
– Host: datascience.seeds4c.org
– Login & pass:
●
Generic: datascience / datascience
●
Private: (sent to your email
accounts)
– Session type: LXDE
9. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
9
Option 2: Connect to remote VM with X2Go
2
10. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
10
What is GNU/Linux?
It is ...
An Open Source (Free/Libre) operating system
Sum of GNU surroundings and the Linux kernel/core
Compatible with UNIX systems
GNU/Linux
(a.k.a «Linux»)
11. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
11
Operating System (in context)
Desktop Environments
Operating System
Hardware
End User Applications
Aqua Luna (XP)
Aero (Vista +)
12. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
12
GNU/Linux
Some basic concepts
GNU Project
Hardware vs software
Open Source & Free software (as in «free» seats)
●
FLOSS: Free/Libre Open Source Software
Free Software Foundation (FSF)
Free Operating Systems
●
GNU/Linux, GNU/Hurd, [Open,Free,Net]-BSD,
OpenSolaris, ReactOS...
Open Source Hardware
13. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
13
Historical Evolution
Source: A. G. Stankevicius. Departarmento de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. http://cs.uns.edu.ar/~ags/linux/
In the origins of software....
Software was born free.
In the decade of the '60, when buying HW, access
was granted to the manufacturer’s SW catalogue
All software distributed together with source code
At the end of the '70, IBM announced their intention
to sell parts of their SW separately.
From then on, proprietary SW (no free) became
common
14. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
14
Historical Evolution
Source: A. G. Stankevicius. Departarmento de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. http://cs.uns.edu.ar/~ags/linux/
GNU/Linux History
1983. Richard M.
Stallman (RMS):
GNU project
1984. Free Software Foundation
(FSF).
First components of the GNU
system, all written by RMS:
●
a C compiler (gcc)
●
an text editor (emacs)
●
and a debugger (gdb)
15. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
15
Historical Evolution
Derived from: A. G. Stankevicius. Departarmento de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. http://cs.uns.edu.ar/~ags/linux/
GNU/Linux History
To guarantee the four freedoms,
RMS invented the concept of
copy-left (reverse of copy-right).
1990: GNU system was almost
complete, only missing to finish
an ambitious kernel (core).
1991: Linus Torvalds wrote a
monolithic kernel
GNU + Linux kernel (by Linus T.):
GNU/Linux
1996: A penguin bit to Linus.
Logo: penguin Tux
16. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
16
GNU/Linux History
Source: https://www.elprocus.com/linux-operating-system/
17. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
17
Desktop Environments (in context)
Desktop Environments
Operating System
Hardware
Applications
Aqua Luna (XP)
Aero (Vista +)
18. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
18
History of Desktop Environments
Source: https://en.wikipedia.org/wiki/Desktop_environment
19. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
19
Desktop Environments (DE)
Desktop Environments
...
KDE
Source: https://en.wikipedia.org/wiki/Desktop_environment
20. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
20
GNU/Linux DE:
Desktop Environments
...
21. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
21
GNU/Linux DE:
Desktop Environments
...
KDE
22. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
22
GNU/Linux DE:
23. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
23
GNU/Linux DE:
24. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
24
GNU/Linux DE:
25. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
25
What is a GNU/Linux «distribution»?
A collection of free software
Core + drivers (modules)
Desktop Environment
Extra + programs + utilities
Support? + Documentation?
KDE
Desktop
Environment
26. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
26
Distributions of GNU/Linux
UbuntuDebian Slackware Gentoo
RedhatFedora Mandriva Suse
27. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
27
Ubuntu (GNU/Linux)
Main characteristic
Distribution based in Debian
Developed by Canonical Ltd.
(South Africa)
Ubuntu:
Philosophy zulú: “Mankind to
others”, “I am because we are”
Slogan Ubuntu: “Linux for
human beings” (or “beans”?)
Definitely, it is the easiest to
install/use
29. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
29
●
Technical differences
– User: root vs. administrator
– Case sensitive vs. Case insensitive
●
"MyFile != myfile" vs. "MyFile = myfile"
– "Symbolic or Hard Links" vs. "Shortcuts"
– Paths: Slash ("/") vs. Backslash ("")
– Main harddrive/partition: "/" vs. "C:"
– Partition formats (file systems): ext2, ext3, ext4, ... vs. fat16/fat32/ntfs
– User default folder: /home/username vs. "C:Documents and Settings" or
"C:UsersusernameMyDocuments"
– USBdisk default folder: /media/username/usbdiskname vs. "X:usbdiskname"
– Secure (Viruses???) vs. Insecure (viruses, bots, worms, trojans, backdoors, ...)
– It can extend computer useful life vs. Planned & perceived obsolescence
– Performance, with same hardware: Faster vs. Slower (Antivirus, antispyware,...)
– Run as admin: "sudo program" in console vs. "Run program as administrator"
GNU/Linux vs. Windows
30. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
30
●
Phylosophical differences
– Freedom: "Free/Libre Open Source Software
(FLOSS)" vs. "Closed Source/Propietary software"
●
Software Sustainability: High vs. Low
– Usually: product given for free (at no Cost) vs.
product for (excessive?) profit
– Money comes through: customizations and
training vs. Selling Software
– FLOSS Fosters local economies as well as big
companies vs. Big & remote Corporations growth.
GNU/Linux vs. Windows
31. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
31
●
Technical differences
– Users:
– Case:
– Links:
– Paths:
– Main harddrive/partition
– Partition formats: ext2, ext3, ext4, ... vs. HFS+
– User default folder: /home/username vs. "/Users/Username"
– USBdisk default folder:
– Secure (Viruses???) vs. Secure (Viruses?)
– It can extend computer useful life vs. Planned & perceived obsolescence
– Performance, with same hardware: Faster??? vs. Extremely Fast
– Run as admin:
GNU/Linux vs. Mac OS X
32. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
32
●
Phylosophical differences
– Freedom: "Free/Libre Open Source Software
(FLOSS)" vs. "Closed Source/Propietary software"
●
Software Sustainability: High vs. Low
– Usually: product given for free (at no Cost) vs.
product for (excessive?) profit
– Money comes through: customizations and
training vs. Selling Software & Hardware
– FLOSS Fosters local economies as well as big
companies vs. Big & remote Corporations growth.
GNU/Linux vs. Mac OS X
33. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
33
Ubuntu (GNU/Linux)
See it in action ...
34. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
34
Command line runs (i)
●
Shell = Terminal = Console ("Black" text screen to run
commands)s
– Secure Shell = ssh = a safe way to connect to remote
computers or servers (encrypted)
– ftp: File Transfer Protocol
●
to transfer files between computers or servers
●
you can NOT run commands (other than listings)
– sftp: like FTP but "secure" (best: using ssh libraries)
●
Local terminal window
– > whoami
35. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
35
Command line runs (ii)
●
Example of simple local commands:
> ps -e
> ls -l
> df -h
> top
> tree . | head
> du . -h | grep G
●
Example of simple editors:
> nano (simple editor: press "Ctrl + X" to eXit)
36. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
36
Command line runs (iii)
●
Example of network-related commands:
> ping google.com
> ifconfig
●
Example of system administration (sysadmin) commands:
> sudo apt update & sudo apt install tree
> sudo adduser foo
> sudo passwd foo
> sudo service apache restart
> sudo kill -9 java
●
Example of simple editors:
> nano (simple editor: press "Ctrl + X" to eXit)
37. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
37
Command line runs (iv)
●
Example of other shell-based programs:
> htop (type "q" to quit/exit)
> mc (Midnight Commander).
Click on F10 (with mouse or trackpad) to quit
> R ( Type "q()" to quit)
●
Serial vs. Parallel tasks
– https://www.datascienceatthecommandline.com/chapter-8-parallel-
pipelines.html
38. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
38
Midnight Commander (mc)
●
Powerful dual-pane file manager in terminals. A Life-
saver for human beans when no X windows in servers
39. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
39
More information...
●
Linux intro (short version):
– http://applied-r.com/linux-intro/
●
Linux OS Basics:
– http://applied-r.com/linux-os-basics/
●
Linux File Management:
– http://applied-r.com/linux-file-management/
●
Linux help:
http://applied-r.com/linux-help/
●
Linux aliases:
– http://applied-r.com/linux-aliases/
●
Linux Utilities:
– http://applied-r.com/linux-utilities/
●
Package management (console based):
– http://applied-r.com/linux-apt/
40. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
40
Pause...
●
Coffee / soda
time?
Source: http://wallpaperpicture--photo.blogspot.com.es/2015/01/funny-ads-pictures-0.html
41. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
41
(2) GNU/Linux Administration
Installation
Package Management
User Management
Permission Management
Device Management
Backup Management
Security
Computer Client Management
Computer Clusters
42. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
42
Image Source: https://www.linux.com/blog/sysadmin-ebook/2017/9/future-proof-your-sysadmin-career-advancing-open-source
Evolution in Technology
Image source: http://ars.userfriendly.org/cartoons/?id=19990718
43. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
43
Installation
How to use or install GNU/Linux?
LiveCD/LiveUSB: Without installing it in the hard disk
Install in the hard disk
More efficient using your hardware, and you can install next
to your other installed OS. You choose OS at booting time.
Install in a disk USB or pendrive
Install in a local virtual machine (or in the cloud)
From scratch (from .iso file), or from a previously
exported virtual machine (.ova file or equivalent)
Windows + andLinux/CoLinux
Win10 + «WSL» (Windows Subsystem for Linux)
44. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
44
Exporting VirtualBox (.OVA)
With current parameters
See: https://www.virtualbox.org
45. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
45
Willing to export to USB? Set it up
...
See: https://www.virtualbox.org
46. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
46
Willing to copy&paste to and/or from Host/Client?
...
See: https://www.virtualbox.org
47. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
47
Importing VirtualBox (.OVA)
VT-x option disabled in BIOS allows only 1 cpu
See: https://www.virtualbox.org
48. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
48
Importing VirtualBox (.OVA)
VT-x option enabled in BIOS
allows 2+ cpu
See: https://www.virtualbox.org
49. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
49
Where to enable/disable it?
VirtualBox VM config for VT-x setting in BIOS
Lubuntu: Lightweidht Desktop on a Ubuntu GNU/Linux Distribution. http://lubuntu.net
See: https://www.virtualbox.org
50. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
50
Other Options to use VirtualBox VM
Connection to remote VBox using X2Go (ssh)
X2Go: Program to connect to a remote computer through GUI (Graphical User Interface)
51. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
51
Other Options to use VirtualBox VM
Connection to remote VBox using X2Go (ssh)
X2Go: http://wiki.x2go.org
52. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
52
Other Options
LiveUSB (example: LXLE Lubuntu-based; BIOS to boot from USB)
LXLE: A Lubuntu-based GNU/Distribution with improved desktop http://www.lxle.net
53. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
53
Other Options
andLinux (side to side to Windows - http://andlinux.sf.net )
54. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
54
Other Options
WSL (Windows Subsystem for Linux)
●
Info:
– https://en.wikipedia.org/wiki/Windows
_Subsystem_for_Linux
●
Installation Guide for Windows 10
– https://docs.microsoft.com/en-us/wind
ows/wsl/install-win10
–
55. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
55
●
GNU/Linux Installation
– Dual/Multi-boot
●
Ubuntu (GNU/Linux) + Windows (or Mac OSX, ...)
●
From USB or CD/DVD
– Requirements
●
Desfrag Hard Drive (in Windows) – if needed
●
Backup Data in external device (usb, network drive, ...)
●
Identify Partitions in your hard drive
– You may have 4 primary partitions (maximum)
– You need 1 free partition (minimum)
●
To make (preferably) 3 logic partitions inside
Installation on Hard Disk
(non WSL but native)
56. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
56
Using Live USB/CD/DVD
Start from a Live USB or CD/DVD for inspecting
& safer partition management
Required: setup computer BIOS to allow booting
from USB or CD/DVD
Press a key to enter configuration
Usually: DEL, F2, ESC, ....
Once there, look for the start sequence (Boot, Boot
device, ...)
57. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
57
Hard Drive Partition Management before Installing GNU/Linux.
Challenging example:
●
HP Laptop from 2015 with M$ Win7
58. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
58
Package Management
(installing «programs»)
Standard Package Management
●
Console based:
●
APT (Debian/Ubuntu/Mint/LXLE)
●
YUM (Redhat/CentOS/Fedora Core)
●
YAST2 (SUSE)
●
Portage (Gentoo)
●
GUI based
●
Synaptic (Debian-based)
●
Many others (distro-specific)
59. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
59
sudo apt install foo
60. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
60
Package Management
From extra repositories
add-apt-repository & apt update & apt install
●
Oracle Java
●
sudo add-apt-repository -y ppa:webupd8team/java
●
R (updated version)
●
sudo add-apt-repository -y ppa:marutter/rrutter
●
Dependencies for R GIS packages (such as tmap)
●
sudo add-apt-repository -y ppa:ubuntugis/ubuntugis-unstable
●
sudo add-apt-repository -y ppa:opencpu/jq
Source: https://seeds4c.org/16.04
61. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
61
Oracle Java
62. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
62
Add missing gpg repo keys
datascience@dspc ~> sudo add-apt-repository -y ppa:nilarimogard/webupd8
datascience@dspc ~> sudo apt install launchpad-getkeys
datascience@dspc ~> sudo launchpad-getkeys
datascience@dspc ~> sudo apt update
With a helper: launchpad-getkeys
Source: https://seeds4c.org/16.04
63. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
63
Add all required ubuntu packages (dependencies)
& dev helpers for some R packages
datascience@dspc ~> sudo add-apt-repository -y ppa:marutter/rrutter
datascience@dspc ~> sudo add-apt-repository -y ppa:ubuntugis/ubuntugis-unstable
datascience@dspc ~> sudo add-apt-repository -y ppa:opencpu/jq
datascience@dspc ~> sudo apt update
datascience@dspc ~> sudo apt install -y r-recommended r-cran-xml libgraphviz-dev
libcairo2-dev r-cran-cairodevice freeglut3 freeglut3-dev r-cran-rglpk r-cran-rgl r-cran-misc3d
libx11-dev libxt-dev libcurl4-gnutls-dev libxml2-dev r-cran-xml libgraphviz-dev libcairo2-dev
bwidget tk-table libv8-dev r-cran-rjava libmpfr-dev libc6 libssl-dev texlive-latex-extra texlive-
lang-spanish libx11-dev libxml2-dev libxml2:i386 libxt-dev r-cran-misc3d subversion git tk-
dev unaccent xvfb libgdal1-dev libproj-dev r-cran-rmysql libmagick++-dev r-cran-
rcolorbrewer r-cran-doparallel libssh2-1-dev libudunits2-dev libgdal-dev libgeos-dev libproj-
dev libv8-3.14-dev libjq-dev libprotobuf-dev protobuf-compiler libssl-dev libcairo2-dev
For Ubuntu 16.06 LTS with some extra repos:
Fix permissions
sudo chmod 777 /usr/lib/R/site-library /usr/lib/R/site-library/* -R
sudo chmod 777 /usr/lib/R/library /usr/lib/R/library/* -R
sudo chmod 777 /usr/share/R/doc/html/* -R
Source: http://seeds4c.org/R
64. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
64
Package Management
From External sources
●
Example: Adobe Acrobar Reader 9.x
●
Example: Portable Signer:
To digitally sign PDF with certificates like FNMT (.p12)
http://portablesigner.sf.net
●
download, uncompress, make executable and run (java app)
Source: https://seeds4c.org/16.04
wget http://ardownload.adobe.com/pub/adobe/reader/unix/9.x/9.5.5/enu/AdbeRdr9.5.5-1_i386linux_enu.deb
sudo dpkg -i AdbeRdr9.5.5-1_i386linux_enu.deb; sudo apt-get -f install #For 32 bits
sudo dpkg -i --force-architecture AdbeRdr9.5.5-1_i386linux_enu.deb; sudo apt-get -f
install #For 64 bits
65. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
65
Package Management (GUI)
Synaptic
66. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
66
User (& Group) Management
GUI based
Console based: adduser, passwd, ...
67. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
67
datascience@dspc ~> sudo su #become root user («Switch User»)
[sudo] password for datascience:
root@dspc:/home/datascience# adduser foo
Adding user `foo' ...
Adding new group `foo' (1001) ...
Adding new user `foo' (1001) with group `foo' ...
Creating home directory `/home/foo' ...
Copying files from `/etc/skel' ...
Enter new UNIX password: (new-user password typed here)
Retype new UNIX password: (new-user password typed again)
passwd: password updated successfully
Changing the user information for foo
Enter the new value, or press ENTER for the default
Full Name []: Foo Bar
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
root@dspc:/home/datascience# exit
exit
datascience@dspc ~>
68. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
68
User (& Group) Management
69. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
69
Permission Management
Console based
●
chmod – change permissions. chown – change ownership
u – user g – group o – other
r – read w – write x – execute
sudo chmod -R ug+rw /DATA/SHARE
sudo chmod -R 660+rw /DATA/SHARE
●
-R → it modifies permission of parent folder & child objects within
●
ug+rw (= 660) → it gives User & Group (but not Others) read and
write access (but not execute access).
●
See:
https://www.linux.com/learn/understanding-linux-file-permissions &
https://www.linux.com/learn/how-manage-file-and-folder-permissions-linux
70. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
70
Permission Management
71. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
71
Permission Management (GUI)
72. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
72
Device Management
Gnome Disks: https://wiki.gnome.org/Apps/Disks
73. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
73
Device Management
Gparted: https://gparted.org
74. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
74
Backup Management: tips
Keep copies elsewhere than same computer
«Elsewhere»: different hardware, different device, different room,
different building, ...
Automatic (regular) backups
«Smart remove» even more important
Efficiency vs. resilience
Simple enough so that some team mates can restore them?
Complex RAID disk setups vs redundant external hard drives
elsewhere?
Encrypted vs unencrypted?
Advice: Use your team crowd wisdom; stay away from single-man
freaks with too-techie solutions
75. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
75
Backup Management: tools
Déjà Dup (Duplicity GUI)
https://wiki.gnome.org/Apps/DejaDup/
https://wiki.gnome.org/Apps/DejaDup/
Backintime (GUI & console; rsync & hard link based)
https://github.com/bit-team/backintime
Luckybackup (rsync-based GU)
http://luckybackup.sourceforge.net/
Backup Ninja (console-based)
https://0xacab.org/riseuplabs/backupninja
Custom Bash Scripts
Some examples at: https://github.com/xavidp/bashscripts
76. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
76
Backintime & Smart remove
78. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
78
Computer Client Management
Epoptes - http://www.epoptes.org
79. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
79
Computer Clusters, ...
Grid, program modules, job queue management ...
●
For newbies: Rocks Clusters distro (CentOS based)
Source: http://ueb.vhir.org/ClusterSeminar
http://www.rocksclusters.org
80. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
80
Computer Clusters
Source: http://ueb.vhir.org/ClusterSeminar
81. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
81
Did you Export your VM (.OVA) to your USB?
Otherwise, you’ll loose your changes when you reboot a computer in
a UB computer classroom or equivalent (frozen images at work)
See: https://www.virtualbox.org
82. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
82
If (1): Shutdown local Lubuntu VM within
VirtualBox
See: https://www.virtualbox.org
1
83. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
83
If (2): Logout X2Go Session to remote host
datascience.seeds4c.org
See: https://www.virtualbox.org
2
84. Data Science. 2020, July 6. | GNU/Linux: Introduction and Administration | Xavier de Pedro Puente
84
More information
●
Ubuntu GNU/Linux:
– http://www.ubuntu.com
●
Data Science Virtual Machine for Linux (Ubuntu)
– https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-
ads.linux-data-science-vm-ubuntu
●
Linkat (Ubuntu):
– Manual Installation with custom partitions (advanced, or for servers):
http://linkat.xtec.cat/portal_linkat/wikilinkat/index.php/Wiki_Linkat_edu_14.04
●
Forums for help and support:
– Ubuntu-es Forums: https://www.ubuntu-es.org/forum
– Ubuntu (Catalan LoCo Team): http://ubuntuforums.org/forumdisplay.php?f=206
●
Data Science Toolbox (to run locally or in the cloud with AWS):
– http://datasciencetoolbox.org/
– https://www.datascienceatthecommandline.com