Dr. Alexy Khrabrov, Open Source Science Community Director, IBM
H2O Open Source GenAI World SF 2023
In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.
Unleash Your Potential - Namagunga Girls Coding Club
Open-Source AI: Community is the Way
1. Alexy Khrabrov, PhD
Open-Source Science Director
IBM Research, Accelerated Discovery
Chair, Generative AI Commons
LF AI & Data, Linux Foundation
@chiefscientist (X/LinkedIn/Telegram)
alexy@ chiefscientist.org
Open-Source AI: Community is the Way
2. Why do we need community around LLMs?
• Claims of trust, safety, performance, transparency and openness cannot be unilateral
announcements by one or even a few companies
• Need an established community vehicle like LF, ML Commons, NumFOCUS
2
Generative AI needs Community
3. 3
PyData
Hamburg, Germany
LLM Avalanche
San Francisco, CA
ChiPy & PyData
Chicago, IL
PyData
Accra, Ghana
UCSC OSPO
Santa Cruz, CA
PyData
Berlin, Germany
OSPOs for Good @ United
Nations
ACS Off-Site, Almaden
SciPy 2023
Austin, TX
LLM Avalanche
San Francisco, CA
ACS
San Francisco, CA
NumFOCUS
Donation
to Data Science
Education
4. • Data
• Models
• Applications
• Community Validation
4
What is the OSS Generative AI?
• Data: training, lakehouses, retraining
• Models: OSS models, serving, inference
• Applications:
• frameworks, prompt engineering, DSP, Open Interpreter
• Enterprise Integration
• Community Validation: benchmarks, openness metrics,
measurable broad societal consensus
5. 5
Generative AI Commons at LF AI & Data
The LF AI & Data Generative AI Commons is dedicated to fostering the
democratization, advancement and adoption of efficient, secure, reliable, and
ethical Generative AI open source innovations through neutral governance, open
and transparent collaboration and education.
6. Alexy Khrabrov and Peter Staar
CZI HQ, Redwood City 10/26/23
DeepSearch used to identify software mentions in Arxiv at
the CZI hackathon
Mapping the Impact of Research Software in Science
(Chan Zuckerburg Initiative Hackathon, Oct 24–27, 2023)
• Which sciences grow faster
with OSS
• Which software is most used,
by discipline
• Which organizations support
OSS
• How to extract software
mentions from papers
• Grants, Authors,
Organizations
• Software citation intent
7. • Digital Transformation 1.0 was low-level automation, a gas-powered horse
• Clerks replaced by PDF flows
• Middle management still in place to operate PDF-enabled clerk teams
• SSAs will replace clerks and middle management workflow (data+instruction)
• Human in the Loop creators will translate strategy to SSAs
• Actual organizational restructuring
• SAP, Oracle, legal integrations
• Industrial infrastructure, machinery,
networks, grids subject to DT2
7
Digital Transformation 2.0: DT2
9. • Ownership
With open-source, organizations can secure AI sovereignty and protect their IP encapsulated in the models. This
empowers them to freely create, modify, and deploy their agents within their own industrial environments, without
vendor lock-in. Factory setup also required high-bandwidth local networks.
– Small is Beautiful (Unix => and Efficient!)
The OSS model can be specialized and compressed, fitting in the environments where it should be deployed. It can
be reasoned about and proven correct for the specific domain, preserving ownership and expertise.
– Do One Thing, Do It Well! (GM)
Specialist models can be fused with
company knowledge.
9
Why OSS AI is needed for Industrial AI?
11. • Generative AI Commons at Linux Foundation
• ML Commons – MLperf benchmark, AI safety group
• Foundation models in Climate, Chemistry, Biology, IBM+NASA+…
• Partnership for AI
• Frontier Model Forum
• OECD/WEF working groups
11
Multiple AI Bodies need to Collaborate