Horacio Gonzalez - Rediscover the known Universe with NASA datasets - Codemot...Codemotion
For many years humanity has been exploring the skies, dreaming of interstellar voyages and new planetary colonies. And you, do you want to go with us to discover the universe? It turns out that NASA has a great public dataset. One of then, the Kepler project dataset, is used by scientists to search for exoplanets, that is, planets outside our solar system. What if you could search for exoplanets by your own, using IT tools and a developer approach? In this talk we introduce you to the HelloExoWorld project that does just that. Join us, and help us to discover exoplanets!
How many times a day do you switch from mouse to keyboard and viceversa? Do you still use BASH? Are you taking advantage of GIT or is GIT fooling you daily? In this talk I’ll show your how to dramatically boost your productivity just by enhancing your development toolset. You will learn many OSX tweaks that get rid of stupid-proof features and give you an extra performance boost, alternative shells that will aid you and enrich your experience when working with the console, application launchers/window managers that will make you forget about your mouse and, finally, how to automate many of your GIT/GitHub development workflows.
The talk is primarely focused on the development of Rails apps on OSX but many of the tricks are also applicable to other OSes and languages.
Introduction to Singularity and Data ContainersVanessa S
A presentation for the encapsulation section at the Dataverse 2020 Community Meeting. Includes an overview of Singularity and dummy implementation / idea for data containers.
A talk to introduce Singularity Registry HPC, which allows you to install Singularity, Podman, or Docker containers (and others) as modules on an HPC system (e.g., LMOD or environment modules). Presented 2021.
Horacio Gonzalez - Rediscover the known Universe with NASA datasets - Codemot...Codemotion
For many years humanity has been exploring the skies, dreaming of interstellar voyages and new planetary colonies. And you, do you want to go with us to discover the universe? It turns out that NASA has a great public dataset. One of then, the Kepler project dataset, is used by scientists to search for exoplanets, that is, planets outside our solar system. What if you could search for exoplanets by your own, using IT tools and a developer approach? In this talk we introduce you to the HelloExoWorld project that does just that. Join us, and help us to discover exoplanets!
How many times a day do you switch from mouse to keyboard and viceversa? Do you still use BASH? Are you taking advantage of GIT or is GIT fooling you daily? In this talk I’ll show your how to dramatically boost your productivity just by enhancing your development toolset. You will learn many OSX tweaks that get rid of stupid-proof features and give you an extra performance boost, alternative shells that will aid you and enrich your experience when working with the console, application launchers/window managers that will make you forget about your mouse and, finally, how to automate many of your GIT/GitHub development workflows.
The talk is primarely focused on the development of Rails apps on OSX but many of the tricks are also applicable to other OSes and languages.
Introduction to Singularity and Data ContainersVanessa S
A presentation for the encapsulation section at the Dataverse 2020 Community Meeting. Includes an overview of Singularity and dummy implementation / idea for data containers.
A talk to introduce Singularity Registry HPC, which allows you to install Singularity, Podman, or Docker containers (and others) as modules on an HPC system (e.g., LMOD or environment modules). Presented 2021.
Gamify Your Life – My Epic Quest of Awesome - Eric BoydFITC
Overview
Wouldn’t it be great if your real life was as compelling as the video games that you play? Several years ago Eric created a Dungeons & Dragons-type experience system for his life. He gains experience points for completing noteworthy projects and goals, and every so often he levels up! Originally the idea was that it would motivate him to complete more goals, but in practice Eric found that it was better as a record of his accomplishments – i.e. it’s backward looking.
So, this year he added a new piece to the quest, a “dashboard” inspired by games like Farmville and Tribez & Castlez. It’s a prominently-displayed set of goals, and his progress toward them that is updated at least weekly. Numbers keep him honest. Come learn about Eric’s epic quest of awesome, and get inspired to make your own!
Objective
Inspire other people to create their own epic quests of awesome, to live their lives like a hero in their own video game.
Target Audience
Geeks who wish their own lives were as compelling as the video games they play.
Assumed Audience Knowledge
Helpful to have exposure to role playing games like dungeons and dragons, or other games which use an experience-level-quest system.
Audience Members Will Learn
How to gamify your life to increase your productivity and feel more accomplished
The Benefits of Warehouse Management for an eCommerce BusinessNet at Work
Explore how dynamic Warehouse Management System (WMS) capabilities can help you grow the eCommerce component of your business and turbocharge your operations.
Fulfillment Replenishment
eFulfillment Picking
E Automated processing - picking, label printing, packslip printing
Maximizing the productivity of your warehouse is an essential part of any business. Read these tips on warehouse shelving and layout to increase your productivity.
인간관계에서 오는 스트레스에는 3가지 유형이 있습니다.
팔로우(친구 맺기)할 때의 스트레스, 언팔(거절)당했을 때의 스트레스, 급기야 차단당하거나 차단해야 하는 인간관계가 생겼을 때의 스트레스.명상을 하러 오는 사람들 중 많은 분들은 '인간관계에서 오는 스트레스'를 해소하려는 의지를 가진 분들이 많습니다.
가족, 배우자, 직장, 사회생활하면서 사람들에게 느꼈던 마음들을 들고 명상을 시작하십니다.그리고 마음을 돌아보고 나면 스트레스의 원인이 '내 마음'에 있었음을 알게 된다고 합니다.
살아왔던 삶을 돌아보면서 어릴 때부터 친했던 친구부터, 최근에 사이가 소원해진 인간관계까지.
차근차근 내면을 되짚어가다 보면 어떤 마음 때문에 인간관계가 불편해졌는지 혹은 좋아질 수 있었는지 통찰이 생깁니다.그래서 이렇게 자신을 돌아보고 있는 명상 피플, 마음수련 논산 메인센터에서 계신 분들 100명에 물었습니다.
[원문보기]마음수련 공식블로그 http://meditationlife.blog.me/220686850173
행복한 명상 마음수련 http://www.MeditationLife.org
Up and Down the Python Data & Web Visualization Stack by Rob Story PyData SV ...PyData
In the past two years, there has been incredible progress in Python data visualization libraries, particularly those built on client-side JavaScript tools such as D3 and Leaflet. This talk will give a brief demonstration of many of the newest charting libs: mpld3 (using Seaborn/ggplot), nvd3-python, ggplot, Vincent, Bearcart, Folium,and Kartograph will be used to visualize a newly-released USGS/FAA wind energy dataset (with an assist from Pandas and the IPython Notebook). After a demo of the current state of Python and web viz, it will discuss the future of how the Python data stack can have seamless interoperability and interactivity with JavaScript visualization libraries.
Gamify Your Life – My Epic Quest of Awesome - Eric BoydFITC
Overview
Wouldn’t it be great if your real life was as compelling as the video games that you play? Several years ago Eric created a Dungeons & Dragons-type experience system for his life. He gains experience points for completing noteworthy projects and goals, and every so often he levels up! Originally the idea was that it would motivate him to complete more goals, but in practice Eric found that it was better as a record of his accomplishments – i.e. it’s backward looking.
So, this year he added a new piece to the quest, a “dashboard” inspired by games like Farmville and Tribez & Castlez. It’s a prominently-displayed set of goals, and his progress toward them that is updated at least weekly. Numbers keep him honest. Come learn about Eric’s epic quest of awesome, and get inspired to make your own!
Objective
Inspire other people to create their own epic quests of awesome, to live their lives like a hero in their own video game.
Target Audience
Geeks who wish their own lives were as compelling as the video games they play.
Assumed Audience Knowledge
Helpful to have exposure to role playing games like dungeons and dragons, or other games which use an experience-level-quest system.
Audience Members Will Learn
How to gamify your life to increase your productivity and feel more accomplished
The Benefits of Warehouse Management for an eCommerce BusinessNet at Work
Explore how dynamic Warehouse Management System (WMS) capabilities can help you grow the eCommerce component of your business and turbocharge your operations.
Fulfillment Replenishment
eFulfillment Picking
E Automated processing - picking, label printing, packslip printing
Maximizing the productivity of your warehouse is an essential part of any business. Read these tips on warehouse shelving and layout to increase your productivity.
인간관계에서 오는 스트레스에는 3가지 유형이 있습니다.
팔로우(친구 맺기)할 때의 스트레스, 언팔(거절)당했을 때의 스트레스, 급기야 차단당하거나 차단해야 하는 인간관계가 생겼을 때의 스트레스.명상을 하러 오는 사람들 중 많은 분들은 '인간관계에서 오는 스트레스'를 해소하려는 의지를 가진 분들이 많습니다.
가족, 배우자, 직장, 사회생활하면서 사람들에게 느꼈던 마음들을 들고 명상을 시작하십니다.그리고 마음을 돌아보고 나면 스트레스의 원인이 '내 마음'에 있었음을 알게 된다고 합니다.
살아왔던 삶을 돌아보면서 어릴 때부터 친했던 친구부터, 최근에 사이가 소원해진 인간관계까지.
차근차근 내면을 되짚어가다 보면 어떤 마음 때문에 인간관계가 불편해졌는지 혹은 좋아질 수 있었는지 통찰이 생깁니다.그래서 이렇게 자신을 돌아보고 있는 명상 피플, 마음수련 논산 메인센터에서 계신 분들 100명에 물었습니다.
[원문보기]마음수련 공식블로그 http://meditationlife.blog.me/220686850173
행복한 명상 마음수련 http://www.MeditationLife.org
Up and Down the Python Data & Web Visualization Stack by Rob Story PyData SV ...PyData
In the past two years, there has been incredible progress in Python data visualization libraries, particularly those built on client-side JavaScript tools such as D3 and Leaflet. This talk will give a brief demonstration of many of the newest charting libs: mpld3 (using Seaborn/ggplot), nvd3-python, ggplot, Vincent, Bearcart, Folium,and Kartograph will be used to visualize a newly-released USGS/FAA wind energy dataset (with an assist from Pandas and the IPython Notebook). After a demo of the current state of Python and web viz, it will discuss the future of how the Python data stack can have seamless interoperability and interactivity with JavaScript visualization libraries.
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" EcosystemsUwe Korn
As a Data Scientist/Engineer in Python, we focus in our work to solve problems with large amounts of data but still stay in Python. This is where we are the most effective and feel comfortable. Libraries like Pandas and NumPy provide us with efficient interfaces to deal with this data while still getting optimal performance. The main problem appears when we have to deal with systems outside of our comfort ecosystem. We need to write cumbersome and mostly slow conversion code that ingests data from there into our pipeline until we can work efficiently. Using Apache Arrow and Parquet as base technologies, we get a set of tools that eases this interaction and also brings us a huge performance improvement. As part of the talk we will show a basic problem where we take data coming from a Java application through Python into using these tools.
Collaborations in the Extreme: The rise of open code development in the scie...Kelle Cruz
Video: https://www.simonsfoundation.org/event/collaborations-in-the-extreme-the-rise-of-open-code-development-in-the-scientific-community/
The internet is changing the scientific landscape by fostering international, interdisciplinary and collaborative software development. More than ever before, software is a crucial component of any scientific result. The ability to easily share code is reshaping expectations about reproducibility -- a fundamental tenet of the scientific process. In this lecture, Kelle Cruz will briefly provide the backstory of how these shifts have come about, describe some of the most impactful open source projects, and discuss efforts currently underway aimed at ensuring these community-led projects are sustainable and receive support.
Making NumPy-style and Pandas-style code faster and run in parallel. Continuum has been working on scaled versions of NumPy and Pandas for 4 years. This talk describes how Numba and Dask provide scaled Python today.
A presentation on mashing up Twitter Annotations with the Semantic Web. June 24, 2010 at the Semantic Technology Conference, San Francisco (SemTech 2010).
PyData Global 2022 presentation by Jeff Hale where you will learn how Prefect Blocks give you the power to share configuration and code with technical and non-technical collaborators.
Jupyter notebooks are transforming the way we look at computing, coding, and problem-solving. But is this the only “data scientist experience” that this technology can provide? In this talk, Natalino will sketch how you could use Jupyter to create interactive and compelling data science web applications and provide new ways of data exploration and analysis. In the background, these apps are still powered by well understood and documented Jupyter notebooks.
When Privacy Scales - Intelligent Product Design under GDPRAmanda Casari
Data-driven companies making intelligent products must design for security and privacy to be competitive globally. The EU General Data Protection Regulation (GDPR), implemented May 2018, is the benchmark that global data privacy will be measured against.
This presentation outlines the basic tenets of personal data and details the high-level changes that GDPR-compliant businesses face. It translates the current and near-future impact to teams designing products driven by machine learning and artificial intelligence and shares use cases of how SAP Concur is designing to meet this challenge while still delivering services to its end users that are driven by advanced algorithms.
Presented at The AI Conference, San Francisco, September 2018
Feature Engineering for Machine Learning at QConSPAmanda Casari
Machine learning fits mathematical models to date to derive insights or make predictions. Engineering the features that sit between data and models is a crucial step in the machine learning pipeline, because the right features can ease the difficulty of modeling and enable results of higher quality. In this talk, we will dive deeper into the mechanisms behind popular feature engineering techniques and walk through use of where these techniques are most useful. You will be able to better identify which methods to use based on your data and the problem you are working to solve.
Understanding Products Driven by Machine Learning and AI: A Data Scientist's ...Amanda Casari
When describing a product as "data-driven" or "fueled by machine learning," it can be difficult to parse a common, fundamental definition of what makes an application "intelligent." In this talk, we will cover how you can peel back the buzzwords from the space of data science, machine learning, and artificial intelligence. You will be able to:
- Start sketching your own frameworks for understanding and evaluating these products
- Better understand how things can go wrong
- Know what questions to ask product vendors, and have a better understanding of their answers
- Learn more about data science as a process, as people organizations, as a product and as a service
Design for X: Exploring Product Design with Apache Spark and GraphLabAmanda Casari
Ideas for designing data science products for: bots, knowledge gaps, IOT + fairness. Combining elements from Apache Spark and Turi's GraphLab products.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
3. data science via random walks
senior data scientist
@ Concur
control systems
engineering +
robotics + legos
officer in USN
operations research
analyst
wandering dirtbag +
conservation volunteer
EE +
applied math
+ complex systems
underwater robotics
consultant
extraordinaire
SAHM
12. “because i can…ethics. bias.
data != numbers.
call to think critically about the politics
+ ethics of visualization - catherine
d’ignazio
the point of collection - mimi onuoha
nasa
13. HOW DO WE (THOUGHTFULLY) SHOW OUR (DATA) WORLD?
periscopic: global gender gap in phds
nytimes: what does a marriage license cost you
2016 primary results + calendar
xkcdsmartmine whale tracker
14. HOW DO WE (THOUGHTFULLY) SHOW OUR (DATA) WORLD?
all figs from papers here
show + tell != show + experience
19. + +
➤ choose your install: conda, pip
➤ seaborn> python viz library based on matplotlib
➤ ipywidgets> add interactive HTML widgets to Jupyter notebooks
➤ which takes advantage of matplotlib’s interactive backend
connectors….
➤ and allows you to layer seaborn for pretty, interactive, lightweight vis
➤ extensible stack for any Python visualization library!
➤ Jupyter + ipywidgets + {insert your fav here}
➤ deploy using Jupyter nbviewer
➤ easy to piece things together for reproducible analysis: reusable code +
interactive plots + deployable notebooks
20. # note to self: boh-kay, not boh-kah
➤ choose your adventure: python, julia, scala, r
➤ choose your install: conda, pip
➤ targets web browsers for presentation
➤ uses tornado to create a bokeh server
output_file()
➤ can also plot in jupyter notebook w/o bokeh server
output_notebook()
➤ uses mplexporter to help convert matplotlib(!) plots in Bokeh plots
➤ plays well with others: seaborn, ggplot.py, pandas
➤ excellent documentation
➤ recent updates are making way for “BIG DATA” vis
➤ brilliant deep dive into bokeh with christine doig of continuum analytics
21. + }
➤ choose your install: conda, pip
➤ uses HTML’s SVG…so not really designed for “BIG DATA” vis
➤ still developing! noted missing features include:
➤ tick locations + tick formatting
➤ plt.xkcd()
➤ plt.annotate()
➤ excellent + growing documentation, including faq’s
➤ extensible beyond matplotlib > client-side interface is pure
javascript library (n.b. current JSON specification designed for
matplotlib)
n.b. this is a jake vanderplas project
23. UNDERSTANDING YOUR INTERNET WORLD
export
bookmarks
(.html)
BeautifulSoup
my_links
my_dirs
my_names
bag o’words
still getting
clever
str, regex
http_v_https
counts
all the vis
DEMO!?!
24. HOW DO WE SHOW OUR (DATA) WORLD WITH PYTHON?
➤ Embed in your favorite Python web framework
➤ Django
➤ Flask
➤ Tornado
➤ Pyramid
➤ Jupyter Notebooks
➤ nbviewer: standalone server for hosting notebooks
➤ Jupyter + github :)
➤ n.b. github renders only static plots (non-html) on jupyter notebooks
➤ Pay us to host your vis!
➤ plot.ly
25. ➤ graphlab create is based on a python data science library
developed + (some) os’d by dato
➤ graphlab canvas: interactive visualization for exploratory
data analysis
NOT ALL BITS ARE OS….BUT….
26. ➤ python library developed + os’d by stitchfix
➤ pyxleyJS React to create Flask-based web apps
“Through the use of the PyReact library, we can use Jinja templating to construct and transform a single React component. The
specific UI components are passed as props to the parent component. A simpler interface is provide through the use of specific
wrappers for each of the component types.”
AND EVEN SHINIER?
27. {THANKS MUCH}
➤ thank you to everyone in the open
source community for giving me
such lovely tools to talk about
➤ thank you @PyLadiesSEA for
listening
➤ thank you again to @Concur for
hosting, snacks + being a
fantastic place to be a PyLady
➤ slides + git repo links will be
posted to meetup page
amanda.casari@concur.com
@amcasari