GraphQL is a query language for your API. This allows you to interact with your existing web services and databases in a new way. Instead of relying on a predetermined output structure from your API, you can “query” it and choose only the fields that you’re interested in. In this talk, learn what GraphQL is all about and how you can take advantage of it in your applications.
A few key GraphQL terms we'll cover are:
* Fields/Types
* Variables
* Queries/Mutations
We’re going to explore how you can create a GraphQL Server with a stack written entirely in Kotlin. Then we'll take a look at how you can integrate the Apollo Client library inside of a Kotlin-powered Android application. GraphQL isn't going away, so here's your chance to get a grip on this exciting technology!
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Beyond PHP - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
GraphQL is a query language for your API. This allows you to interact with your existing web services and databases in a new way. Instead of relying on a predetermined output structure from your API, you can “query” it and choose only the fields that you’re interested in. In this talk, learn what GraphQL is all about and how you can take advantage of it in your applications.
A few key GraphQL terms we'll cover are:
* Fields/Types
* Variables
* Queries/Mutations
We’re going to explore how you can create a GraphQL Server with a stack written entirely in Kotlin. Then we'll take a look at how you can integrate the Apollo Client library inside of a Kotlin-powered Android application. GraphQL isn't going away, so here's your chance to get a grip on this exciting technology!
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Beyond PHP - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Beyond PHP - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Kai Chan
These are the slides for the session I presented at SoCal Code Camp Los Angeles on November 10, 2013.
http://www.socalcodecamp.com/socalcodecamp/session.aspx?sid=8cdfd955-2cd4-44a2-ad08-5353e079685a
Beyond PHP - It's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
Comparisons are fundamental to computing - and comparing strings is not nearly as straightforward as you might think. Come learn about the history, nuance and surprises of “putting words in order” that you never knew existed in computer science, and how that nuance impacts both general programming and SQL programming. Next, walk through a few actual scenarios and demonstrations using PostgreSQL as a user and administrator, which you can re-run yourself later for further study, including one way you could easily corrupt your self-managed PostgreSQL database if you aren't prepared. Finally we’ll dive into an explanation of the surprising behaviors we saw in PostgreSQL, and learn more about user and administrative features PostgreSQL provides related to localized string comparison.
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)Андрей Новиков
PostgreSQL has become the most popular RDBMS in the Ruby ecosystem in the last decade. It has a great set of built-in features, including a variety of versatile data types, both common and very specific.
But when we load data from the database to our application code, we're working with Ruby data types: classes from the standard library, Rails, or other gems. So while they can seem to be the same as their PostgreSQL counterparts, they are not absolutely identical, and sometimes that could lead to surprising behavior.
In this talk, I would like to explore the power of data types in PostgreSQL and Ruby and how to work with them properly to use both Ruby and PostgreSQL on 100% of their power!
Introduces important facts and tools to help you get starting with performance improvement.
Learn to monitor and analyze important metrics, then you can start digging and improving.
Includes useful munin probes, predefined SQL queries to investigate your database's performance, and a top 5 of the most common performance problems in custom Apps.
By Olivier Dony - Lead Developer & Community Manager, OpenERP
Beyond php it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Columnar processing for SQL-on-Hadoop: The best is yet to comeWang Zuo
Apache Parquet is an open-source file-format which arranges all of its data into columns – this is distinct from the traditional row-oriented layout, which stores entire rows consecutively. Columnar data offers lots of advantages to modern data engines – like Impala, Apache Spark, and Apache Flink – in terms of IO efficiency, but the full benefits of the format are yet to be realized.
We have been working with Intel to apply modern CPU instruction sets to the common programming tasks associated with querying data in Parquet format: decompression, predicate evaluation, and row-reconstruction. Our work has yielded significant speedups in standard query benchmarks running on Cloudera’s Impala SQL query engine, and very high speedups in targeted microbenchmarks.
In this talk we’ll describe the symbiosis between modern CPU architectures and the requirements of columnar data processing. We’ll show how vectorization – processing many items with a single instruction – is a widely applicable technique that can provide real performance benefits to all application frameworks that use columnar formats. We’ll present the changes that we have made to Impala’s ‘scanner,’ which reads Parquet data, and map out even more future enhancements.
This talk will be of interest to audiences interested in the internals of big data processing engines, or the impact of recent advances in modern CPU architectures.
Nsd, il tuo compagno di viaggio quando Domino va in crashFabio Pignatti
Come leggere e trarre utili informazioni dall'analisi di un NSD in caso di crash o hang del server Domino. Alcuni casi pratici ed un tool utile in fase di analisi. - Dominopoint Day 2008
Machine learning in php php con polandDamien Seguy
Machine learning is teaching the computer how to learn by itself. It is far easier to be done, especially when you have small data set and a good level of expertise in your field. Classifying objects, predicting who will buy, spotting comments in code is achieved with grassy algorithms like neural networks, genetic algorithms or ant herding. PHP is in good position to make use of such teachings, and take advantages of related technologies like fann. By the end of the session, you'll know where you want to try it.
Most today's software is highly static, even if it is written in a dynamic language like Smalltalk. Developers are not encouraged to extend the frameworks they are using; and end-users are unable to change the features of their software without initiating a new development effort. In contrast, extensible software is designed for change; and customizable software can be adapted to new needs without requiring an in-depth knowledge of the underlying implementation domain.
In this presentation I will investigate on how to write truly dynamic software and I will distill common patterns of software customizability. As running examples I present tools that I worked on during my path of discovering Smalltalk. One of these examples is Magritte, a dynamic meta-model that gives end-users the possibility to customize their applications without the need of an additional development effort. Another example is Helvetia, an infrastructure enabling on-the-fly customization of the programming language and development environment.
Using Query Embeddings based on user sessions for query expansion.
Discover how we expand queries at OLX Group by embedding our queries using Neural Networks.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Beyond PHP - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Kai Chan
These are the slides for the session I presented at SoCal Code Camp Los Angeles on November 10, 2013.
http://www.socalcodecamp.com/socalcodecamp/session.aspx?sid=8cdfd955-2cd4-44a2-ad08-5353e079685a
Beyond PHP - It's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
Comparisons are fundamental to computing - and comparing strings is not nearly as straightforward as you might think. Come learn about the history, nuance and surprises of “putting words in order” that you never knew existed in computer science, and how that nuance impacts both general programming and SQL programming. Next, walk through a few actual scenarios and demonstrations using PostgreSQL as a user and administrator, which you can re-run yourself later for further study, including one way you could easily corrupt your self-managed PostgreSQL database if you aren't prepared. Finally we’ll dive into an explanation of the surprising behaviors we saw in PostgreSQL, and learn more about user and administrative features PostgreSQL provides related to localized string comparison.
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)Андрей Новиков
PostgreSQL has become the most popular RDBMS in the Ruby ecosystem in the last decade. It has a great set of built-in features, including a variety of versatile data types, both common and very specific.
But when we load data from the database to our application code, we're working with Ruby data types: classes from the standard library, Rails, or other gems. So while they can seem to be the same as their PostgreSQL counterparts, they are not absolutely identical, and sometimes that could lead to surprising behavior.
In this talk, I would like to explore the power of data types in PostgreSQL and Ruby and how to work with them properly to use both Ruby and PostgreSQL on 100% of their power!
Introduces important facts and tools to help you get starting with performance improvement.
Learn to monitor and analyze important metrics, then you can start digging and improving.
Includes useful munin probes, predefined SQL queries to investigate your database's performance, and a top 5 of the most common performance problems in custom Apps.
By Olivier Dony - Lead Developer & Community Manager, OpenERP
Beyond php it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Columnar processing for SQL-on-Hadoop: The best is yet to comeWang Zuo
Apache Parquet is an open-source file-format which arranges all of its data into columns – this is distinct from the traditional row-oriented layout, which stores entire rows consecutively. Columnar data offers lots of advantages to modern data engines – like Impala, Apache Spark, and Apache Flink – in terms of IO efficiency, but the full benefits of the format are yet to be realized.
We have been working with Intel to apply modern CPU instruction sets to the common programming tasks associated with querying data in Parquet format: decompression, predicate evaluation, and row-reconstruction. Our work has yielded significant speedups in standard query benchmarks running on Cloudera’s Impala SQL query engine, and very high speedups in targeted microbenchmarks.
In this talk we’ll describe the symbiosis between modern CPU architectures and the requirements of columnar data processing. We’ll show how vectorization – processing many items with a single instruction – is a widely applicable technique that can provide real performance benefits to all application frameworks that use columnar formats. We’ll present the changes that we have made to Impala’s ‘scanner,’ which reads Parquet data, and map out even more future enhancements.
This talk will be of interest to audiences interested in the internals of big data processing engines, or the impact of recent advances in modern CPU architectures.
Nsd, il tuo compagno di viaggio quando Domino va in crashFabio Pignatti
Come leggere e trarre utili informazioni dall'analisi di un NSD in caso di crash o hang del server Domino. Alcuni casi pratici ed un tool utile in fase di analisi. - Dominopoint Day 2008
Machine learning in php php con polandDamien Seguy
Machine learning is teaching the computer how to learn by itself. It is far easier to be done, especially when you have small data set and a good level of expertise in your field. Classifying objects, predicting who will buy, spotting comments in code is achieved with grassy algorithms like neural networks, genetic algorithms or ant herding. PHP is in good position to make use of such teachings, and take advantages of related technologies like fann. By the end of the session, you'll know where you want to try it.
Most today's software is highly static, even if it is written in a dynamic language like Smalltalk. Developers are not encouraged to extend the frameworks they are using; and end-users are unable to change the features of their software without initiating a new development effort. In contrast, extensible software is designed for change; and customizable software can be adapted to new needs without requiring an in-depth knowledge of the underlying implementation domain.
In this presentation I will investigate on how to write truly dynamic software and I will distill common patterns of software customizability. As running examples I present tools that I worked on during my path of discovering Smalltalk. One of these examples is Magritte, a dynamic meta-model that gives end-users the possibility to customize their applications without the need of an additional development effort. Another example is Helvetia, an infrastructure enabling on-the-fly customization of the programming language and development environment.
Using Query Embeddings based on user sessions for query expansion.
Discover how we expand queries at OLX Group by embedding our queries using Neural Networks.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
5. Cross-Device linking competition
Using clickstream data
finds logs that belong to
the same user
https://www.slideshare.net/AlexeyGrigorev/cikm-cup-2016-crossdevice-linking
13. Record Linkage vs Duplicates
Schema 1 Schema 2 Unified schema
Schema 3
Restoring the
duplicates graph!
14. Duplicates
For each rec i find
duplicates {rec i1, …, rec ik}
from the set of n records
ID F1 F2 ... Fm
Rec 1 f11 f12 f1m
Rec 2 f21 f22 f2m
... ...
Rec n fn1 fn2 fnm
15.
16.
17.
18. ML for Duplicates
● Compare each pair with each?
● 1000 items => 1000 x 999 / 2 = 499 500 pairs
● Real datasets: millions! (avito: 51mln, olx.ua: 11mln)
19. ML for Duplicates
● Graph is very sparse!
● Don’t need to compare everything w everything
Reality
20. ML for Duplicates
● First step:
● Candidate selection
Idea:
● First, find candidate duplicates
(10-200)
● Then, get real duplicates (0-50)
For each rec i find
duplicates {rec i1, …, rec ik}
from the set of n records
k=0..50 items
23. Domain knowledge
Candidates share the same
● Category (iphone)
● City (Birobidzhan) / district
● Seller id
● IP address of the seller
● Device signature
26. Domain knowledge
Candidates share the same
● Category (iphone)
● City (Birobidzhan) / district
● Seller id
● IP address of the seller
● Device signature
Easy to implement in any RDB!
35. Word2Vec features
How to compare documents?
● Title1: “used bmw”
● Title2: “selling almost new bmw”
selling almost new bmw
used 0.3 0.1 0.6 0.5
bmw 0.2 0.1 0.55 1
min mean max std
0.1 0.41 1.0 0.28
36. Word2Vec features part 2
How to compare documents?
● Title1: “used bmw”
● Title2: “selling almost new bmw”
selling almost new bmw
used 0.3 0.1 0.6 0.5
bmw 0.2 0.1 0.55 1
min mean max std
0.1 0.33 0.6 0.20
47. Candidate selection step
Candidates share the same
● Category (iphone)
● City (Birobidzhan) / district
● Seller id
● IP address of the seller
● Device signature
● Image hash
61. Embedding index
● Numpy arrays
● Do X.dot(query)
● Client aggregates
● Approximate KNN! LSH techniques
● Almost the same as X.dot(query) but faster
● Many implementations: FAISS, Annoy, etc
Becomes slow as it
grows
84. Random projections
● Generate m random vectors pi
● For each compute (u . pi >= 0)
● Create hash = [(u . p0 >= 0), (u . p1 >= 0), ...)]
For two vectors v and u
● Number of different bits ~ the angle
● Approximation becomes better as m grows
u
v
theta
85. x1 x2
u -0.92 0.38
v -0.61 0.78
Projection vectors
u
v
u
v