Finitio is a language to validate, coerce and document data in configuration files, data exchanges, data APIS, SQL and NoSQL databases.
It comes with a dedicated type system for data, and a interoperability layer with programming languages.
Towards a Human Language Project for Multilingual Europe: AI and InterpretationGeorg Rehm
Georg Rehm. Towards a Human Language Project for Multilingual Europe: AI and Interpretation. DG Interpretation Conference - Interpretation: Sharing Knowledge & Fostering Communities. European Commission, Brussels, April 2018. April 19/20, 2018. Invited talk.
Review Session and Attending Java InterviewsRatnaJava
In this session you will learn:
How to prepare for a typical Java interview?
Typical interview questions…
For more information: https://www.mindsmapped.com/courses/software-development/become-a-java-developer-hands-on-training/
Towards a Human Language Project for Multilingual Europe: AI and InterpretationGeorg Rehm
Georg Rehm. Towards a Human Language Project for Multilingual Europe: AI and Interpretation. DG Interpretation Conference - Interpretation: Sharing Knowledge & Fostering Communities. European Commission, Brussels, April 2018. April 19/20, 2018. Invited talk.
Review Session and Attending Java InterviewsRatnaJava
In this session you will learn:
How to prepare for a typical Java interview?
Typical interview questions…
For more information: https://www.mindsmapped.com/courses/software-development/become-a-java-developer-hands-on-training/
RDF and other linked data standards — how to make use of big localization dataDave Lewis
The standards and interoperability challenge to using the Resource Description Framework for data resource in linked data. Based on work from CNGL (www.cngl.ie), the FALCON project (www.falcon-project.eu) and the LIDER project (www.lider-project.eu)
RDF and other linked data standards — how to make use of big localization dataDave Lewis
The standards and interoperability challenge to using the Resource Description Framework for data resource in linked data. Based on work from CNGL (www.cngl.ie), the FALCON project (www.falcon-project.eu) and the LIDER project (www.lider-project.eu)
Feb.2016 Demystifying Digital Humanities - Workshop 2Paige Morgan
Slides from Demystifying Digital Humanities Workshop 2: Data Wrangling: Exploring Programming in Digital Scholarship -- taught at the University of Miami Libraries in February, 2016
Opening up and linking data is becoming a priority for many data producers because of institutional requirements, or to consume data in newer applications, or simply to keep pace with current development. Since 2014, this priority has gaining momentum with the Global Open Data in Agriculture and Nutrition initiative (GODAN). However, typical small and medium-size institutions have to deal with constrained resources, which often hamper their possibilities for making their data publicly available. This webinar will be of interest to any institution seeking ways to publish and curate data in the Linked Data World.
Sands Fish - Knowing in the Age of Networked Knowledgesandsfish
Knowledge representation has become extremely complex since the advent of the internet, online education, and commons-based peer production. This talk discusses the thresholds we've crossed and what it means to know something when knowledge is massively interlinked.
Digital Academic Library of the North - Northern Collaboration presentationChris Awre
A presentation given at the Northern Collaboration conference on Friday 13th September at the University of Huddersfield. The presentation proposes the vision of a shared repository underpinning a digital library of institutional assets to enable repository collection scalability and promote public awareness of research and teaching within northern universities.
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" EcosystemsUwe Korn
As a Data Scientist/Engineer in Python, we focus in our work to solve problems with large amounts of data but still stay in Python. This is where we are the most effective and feel comfortable. Libraries like Pandas and NumPy provide us with efficient interfaces to deal with this data while still getting optimal performance. The main problem appears when we have to deal with systems outside of our comfort ecosystem. We need to write cumbersome and mostly slow conversion code that ingests data from there into our pipeline until we can work efficiently. Using Apache Arrow and Parquet as base technologies, we get a set of tools that eases this interaction and also brings us a huge performance improvement. As part of the talk we will show a basic problem where we take data coming from a Java application through Python into using these tools.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
5. Coupling issue known for ages
• Niklaus Wirth, 1976
– The father of Pascal,
Modula 2 and a lot more
• Main message
– Data structures and
Algorithms are highly related
• Yields another coupling issue
– Between software
components
– Hurts evolution
6. A solution used for ages
• Information Hiding
– Parnas, 1971
• Abstract Data Types
– Liskov, 1974
Avoid coupling between software components
• Encapsulate data
• Access it only via behavioral interfaces
• e.g. the Stack ADT and its axiomatic contract
8. A Possible Root Cause
• Programming languages & practices have
strong biases towards Behavior
– Type systems & Type checking algorithms
– APIs and documentation
– Testing
• They mostly ignore the Data perspective of
engineering
– To be is too often sacrificed over To Behave
9. To Be is to be a Value
that is, a member of a Type
• An interesting question is
– Are you, value v, a member of type T ?
• Examples
– Are you 13 an Integer between 0 and 45 ?
– Are you {…} a Member information with a valid
Password, that is, a String of min 8 characters ?
• Useful only if we can capture interesting
Types in the first place
– Sets of values, Arbitrary Subsets, Supersets
– Weak or no support in conventional prog. languages
10. Data Deserves a Language Too
What would a language strongly
biased towards data look like?
OR
11. http://www.finitio.io/try
• Finitio is a language for …
– Enforcing
– Validating
– Documenting
– Coercing
• … Datatypes in
– Files
– APIs
– Exchanges
– Databases
• + an interoperability layer: Information Contracts
– A proposed dual to ADTs’ axiomatic contracts
12. Meet Finitio & Contribute
• http://www.finitio.io/
– The best starting point
• github.com/blambeau/finitio
– Language specification, e2e tests, doc source
• github.com/blambeau/finitio-rb
– Ruby binding
• github.com/llambeau/finitio.js
– Javascript binding