SlideShare a Scribd company logo
1 of 53
Jean-Georges Perrin • @jgperrin
It's painful
how much
data rules the world
All Things Open Meetup
Raleigh Convention Center • Raleigh, NC
September 15th 2021
The opinions expressed in this presentation and on the
following slides are solely those of the presenter and not
necessarily those of The NPD Group. The NPD Group does
not guarantee the accuracy or reliability of the information
provided herein.
Jean-Georges “jgp" Perrin
Software since 1983 >$0 1995
Big Data since 1984 >$0 2006
AI since 1994 >$0 2010
x13
It’s a story
about
data
4 avril 1980
Air & Space
Source:
NASA
Find & process the
data, not in Excel
Display the data in
a palatable form
Source:
Pexels
Sources:
Bureau of Transportation Statistics: https://www.transtats.bts.gov/TRAFFIC/
+----------+----------------+-----------+-----+
|month |internationalPax|domesticPax|pax |
+----------+----------------+-----------+-----+
|2000-01-01|5394 |41552 |46946|
|2000-02-01|5249 |43724 |48973|
|2000-03-01|6447 |52984 |59431|
|2000-04-01|6062 |50349 |56411|
|2000-05-01|6342 |52320 |58662|
+----------+----------------+-----------+-----+
only showing top 5 rows
root
|-- month: date (nullable = true)
|-- internationalPax: integer (nullable = true)
|-- domesticPax: integer (nullable = true)
|-- pax: integer (nullable = true)
I !
दृिष्ट • dṛṣṭi
Open source, React & IBM Carbon-based
data visualization framework
Download at https://jgp.ai/drsti
Apply light data quality
Create a session
Create a schema
SparkSession spark = SparkSession.builder()
.appName("CSV to Dataset")
.master("local[*]")
.getOrCreate();
StructType schema = DataTypes.createStructType(new StructField[] {
DataTypes.createStructField(“month", DataTypes.DateType, false),
DataTypes.createStructField(“pax", DataTypes.IntegerType, true) });
Dataset<Row> internationalPaxDf = spark.read().format("csv")
.option("header", true)
.option("dateFormat", "MMMM yyyy")
.schema(schema)
.load("data/bts/International USCarrier_Traffic_20210902163435.csv");
internationalPaxDf = internationalPaxDf
.withColumnRenamed("pax", "internationalPax")
.filter(col("month").isNotNull())
.filter(col("internationalPax").isNotNull());
Dataset<Row> domesticPaxDf = spark.read().format("csv")
.option("header", true)
.option("dateFormat", "MMMM yyyy")
.schema(schema)
.load("data/bts/Domestic USCarrier_Traffic_20210902163435.csv");
domesticPaxDf = domesticPaxDf
.withColumnRenamed("pax", "domesticPax")
.filter(col("month").isNotNull())
.filter(col("domesticPax").isNotNull());
/jgperrin/ai.jgp.drsti-spark
Lab #300
Ingest international passengers
Ingest domestic passengers
Dataset<Row> df = internationalPaxDf
.join(domesticPaxDf,
internationalPaxDf.col(“month").equalTo(domesticPaxDf.col("month")),
"outer")
.withColumn("pax", expr("internationalPax + domesticPax"))
.drop(domesticPaxDf.col("month"))
.filter(col("month").$less(lit("2020-01-01").cast(DataTypes.DateType)))
.orderBy(col("month"))
.cache();
df = DrstiUtils.setHeader(df, "month", "Month of");
df = DrstiUtils.setHeader(df, "pax", "Passengers");
df = DrstiUtils.setHeader(df, "internationalPax", "International Passengers");
df = DrstiUtils.setHeader(df, "domesticPax", "Domestic Passengers");
DrstiChart d = new DrstiLineChart(df);
d.setTitle("Air passenger traffic per month");
d.setXScale(DrstiK.SCALE_TIME);
d.setXTitle("Period from " + DataframeUtils.min(df, "month") + " to “ + DataframeUtils.max(df, "month"));
d.setYTitle("Passengers (000s)");
d.render();
/jgperrin/ai.jgp.drsti-spark
Lab #300
All my data processing
Add meta data directly to the dataframe
Configure dṛṣṭi directly on the server
Aren’t you glad we
are using Java?
Apps
Analytics
Distrib.
Hardware
OS
Apps
Hardware
Hardware
OS OS
Distributed OS
Analytics OS
Apps
Hardware
Hardware
OS OS
An analytics operating system?
Hardware
Hardware
OS OS
Distributed OS
Analytics OS
Apps
{
An analytics operating system?
Domestic
passengers
(CSV)
International
passengers
(CSV)
Domestic
passengers
(dataframe)
International
passengers
(dataframe) Combining
through an
outer join
Passengers
(dataframe)
Enhanced
data
(dataframe)
Enhanced
data
(CSV)
Visualization
metadata
(JSON)
dṛṣṭi
visualization
Server processing through . Transfer Visualization
Applying to our air traffic app
Spark SQL
Spark
Streaming
MLlib
(machine
learning)
GraphX
(graph)
Apache Spark
Node 1 -
OS
Node 2 -
OS
Node 3 -
OS
Node 4 -
OS
Node 1 -
HW
Node 2 -
HW
Node 3 -
HW
Node 4 -
HW
Spark SQL Spark Streaming
Spark MLlib
Machine learning
& artificial intelligence
Spark GraphX
Node 5 -
OS
Node 5 -
HW
Your application
…
…
Unified API
Node 6 -
OS
Node 6 -
HW
Node 7 -
OS
Node 7 -
HW
Node 8 -
OS
Node 8 -
HW
Spark SQL
Spark Streaming
Spark MLlib
Machine learning
& artificial intelligence
Spark GraphX
Your application
Dataframe
Unified API
Node 1 -
OS
Node 2 -
OS
Node 3 -
OS
Node 4 -
OS
Node 5 -
OS
…
Node 6 -
OS
Node 7 -
OS
Node 8 -
OS
Spark SQL
Spark Streaming
Spark MLlib
Machine learning
& artificial intelligence
Spark GraphX
Dataframe
Source:
Pexels
SparkSession spark = SparkSession.builder()
.appName("CSV to Dataset")
.master("local[*]")
.getOrCreate();
StructType schema = DataTypes.createStructType(new StructField[] {
DataTypes.createStructField(“month", DataTypes.DateType, false),
DataTypes.createStructField(“pax", DataTypes.IntegerType, true) });
Dataset<Row> internationalPaxDf = spark.read().format("csv")
.option("header", true)
.option("dateFormat", "MMMM yyyy")
.schema(schema)
.load("data/bts/International USCarrier_Traffic_20210902163435.csv");
internationalPaxDf = internationalPaxDf
.withColumnRenamed("pax", "internationalPax")
.filter(col("month").isNotNull())
.filter(col("internationalPax").isNotNull());
Dataset<Row> domesticPaxDf = spark.read().format("csv")
.option("header", true)
.option("dateFormat", "MMMM yyyy")
.schema(schema)
.load("data/bts/Domestic USCarrier_Traffic_20210902163435.csv");
domesticPaxDf = domesticPaxDf
.withColumnRenamed("pax", "domesticPax")
.filter(col("month").isNotNull())
.filter(col("domesticPax").isNotNull());
/jgperrin/ai.jgp.drsti-spark
Lab #310
Dataset<Row> df = internationalPaxDf
.join(domesticPaxDf,
internationalPaxDf.col("month")
.equalTo(domesticPaxDf.col("month")),
"outer")
.withColumn("pax", expr("internationalPax + domesticPax"))
.drop(domesticPaxDf.col("month"))
.filter(
col("month").$less(lit("2020-01-01").cast(DataTypes.DateType)))
.orderBy(col("month"))
.cache();
Dataset<Row> dfQuarter = df
.withColumn("year", year(col("month")))
.withColumn("q", ceil(month(col("month")).$div(3)))
.withColumn("period", concat(col("year"), lit("-Q"), col("q")))
.groupBy(col("period"))
.agg(sum(“pax").as("pax"),
sum("internationalPax").as("internationalPax"),
sum("domesticPax").as("domesticPax"))
.drop("year")
.drop("q")
.orderBy(col("period"));
/jgperrin/ai.jgp.drsti-spark
Lab #310
New code for quarter
SparkSession spark = SparkSession.builder()
.appName("CSV to Dataset")
.master("local[*]")
.getOrCreate();
StructType schema = DataTypes.createStructType(new StructField[] {
DataTypes.createStructField(“month", DataTypes.DateType, false),
DataTypes.createStructField(“pax", DataTypes.IntegerType, true) });
Dataset<Row> internationalPaxDf = spark.read().format("csv")
.option("header", true)
.option("dateFormat", "MMMM yyyy")
.schema(schema)
.load("data/bts/International USCarrier_Traffic_20210902163435.csv");
internationalPaxDf = internationalPaxDf
.withColumnRenamed("pax", "internationalPax")
.filter(col("month").isNotNull())
.filter(col("internationalPax").isNotNull());
Dataset<Row> domesticPaxDf = spark.read().format("csv")
.option("header", true)
.option("dateFormat", "MMMM yyyy")
.schema(schema)
.load("data/bts/Domestic USCarrier_Traffic_20210902163435.csv");
domesticPaxDf = domesticPaxDf
.withColumnRenamed("pax", "domesticPax")
.filter(col("month").isNotNull())
.filter(col("domesticPax").isNotNull());
/jgperrin/ai.jgp.drsti-spark
Lab #320
Dataset<Row> df = internationalPaxDf
.join(domesticPaxDf,
internationalPaxDf.col("month")
.equalTo(domesticPaxDf.col("month")),
"outer")
.withColumn("pax", expr("internationalPax + domesticPax"))
.drop(domesticPaxDf.col("month"))
.filter(
col("month").$less(lit("2020-01-01").cast(DataTypes.DateType)))
.orderBy(col("month"))
.cache();
Dataset<Row> dfYear = df
.withColumn("year", year(col("month")))
.groupBy(col("year"))
.agg(sum("pax").as("pax"),
sum("internationalPax").as("internationalPax"),
sum("domesticPax").as("domesticPax"))
.orderBy(col("year"));
/jgperrin/ai.jgp.drsti-spark
Lab #320
New code for year
Data
Bronze
Raw Data
Silver
Pure Data
Gold
Rich Data
Actionable
Data
Application
of
Data Quality
rules
Ingestion
Transfor-
mation
Publication
“Cache”
A (Big) Data Scenario
Building a pipeline
Data
Bronze
Raw Data
Silver
Pure Data
Gold
Rich Data
Actionable
Data
Application
of
Data Quality
rules
Ingestion
Transfor-
mation
Publication
“Cache”
A (Big) Data Scenario
Building a pipeline
// Combining datasets
// ...
df.write()
.format("delta")
.mode("overwrite")
.save("./data/tmp/airtrafficmonth");
/jgperrin/ai.jgp.drsti-spark
Lab #400
Saving to Delta Lake
Dataset<Row> df = spark.read().format("delta")
.load("./data/tmp/airtrafficmonth")
.orderBy(col("month"));
Dataset<Row> dfYear = df
.withColumn("year", year(col("month")))
.groupBy(col("year"))
.agg(sum(“pax").as("pax"),
...
/jgperrin/ai.jgp.drsti-spark
Lab #430
Reading from Delta Lake
Can we project future traffic?
Source:
Comedy Central
Do you remember January 2020?
And March?
Source:
Pexels
• Make a model for 2000-2019
• See the projection
• Use 2020 data & imputation for
the rest of 2021
• See the projection
What now?
Source:
Pexels
Label
Feature
Use my model
Split training & test data
String[] inputCols = { "year" };
VectorAssembler assembler = new VectorAssembler().setInputCols(inputCols).setOutputCol("features");
df = assembler.transform(df);
LinearRegression lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.5)
.setElasticNetParam(0.8)
.setLabelCol("pax");
int threshold = 2019;
Dataset<Row> trainingData = df.filter(col("year").$less$eq(threshold));
Dataset<Row> testData = df.filter(col("year").$greater(threshold));
LinearRegressionModel model = lr.fit(trainingData);
Integer[] l = new Integer[] { 2020, 2021, 2022, 2023, 2024, 2025, 2026 };
List<Integer> data = Arrays.asList(l);
Dataset<Row> futuresDf = spark.createDataset(data, Encoders.INT()).toDF().withColumnRenamed("value", "year");
assembler = new VectorAssembler().setInputCols(inputCols).setOutputCol("features");
futuresDf = assembler.transform(futuresDf);
df = df.unionByName(futuresDf, true);
df = model.transform(df);
Features are a vector - let’s build one
Build a linear regression
Building my model
/jgperrin/ai.jgp.drsti-spark
Lab #500
Something happened in 2020…
Source:
Pexels
Something happened in 2020…
Source:
Pexels
Label
Feature
Imputation
Real data for 2021
Model 2000-2019
Model 2000-2021
Connect now to dṛṣṭi
http:/
/172.25.177.2:3000
Dataset<Row> df2021 = df.filter(expr(
"month >= TO_DATE('2021-01-01') and month <= TO_DATE('2021-12-31')"));
int monthCount = (int) df2021.count();
df2021 = df2021
.agg(sum("pax").as("pax"),
sum("internationalPax").as("internationalPax"),
sum("domesticPax").as("domesticPax"));
int pax = DataframeUtils.maxAsInt(df2021, "pax") / (12 - monthCount);
int intPax = DataframeUtils.maxAsInt(df2021, “internationalPax") / (12 - monthCount);
int domPax = DataframeUtils.maxAsInt(df2021, "domesticPax") / (12 - monthCount);
List<String> data = new ArrayList();
for (int i = monthCount + 1; i <= 12; i++) {
data.add("2021-" + i + "-01");
}
Dataset<Row> dfImputation2021 = spark
.createDataset(data, Encoders.STRING()).toDF()
.withColumn("month", col("value").cast(DataTypes.DateType))
.withColumn("pax", lit(pax))
.withColumn("internationalPax", lit(intPax))
.withColumn("domesticPax", lit(domPax))
.drop("value");
Extract 2021 data
/jgperrin/ai.jgp.drsti-spark
Lab #600
Calculate imputation data
Create a new dataframe, from scratch with the
additional data
LinearRegression lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8).setLabelCol("pax");
LinearRegressionModel model2019 = lr.fit(df.filter(col(“year").$less$eq(2019)));
df = model2019
.transform(df)
.withColumnRenamed("prediction", "prediction2019");
LinearRegressionModel model2021 = lr.fit(df.filter(col("year").$less$eq(2021)));
df = model2021
.transform(df)
.withColumnRenamed("prediction", "prediction2021");
Pretty much the same code as lab #500,
except: for renaming columns
/jgperrin/ai.jgp.drsti-spark
Lab #610
Reusing the same linear regression for both
model,
but the model is different!
Same model
Trainer Model
Dataset #1
Model
Dataset #2
Predicted
Data
Step 1:
Learning
phase
Step 2..n:
Predictive
phase
It’s all about the base model
Scientist & Engineer
There are two kinds of
data scientists:
1) Those who can
extrapolate from
incomplete data.
DATA
Engineer
DATA
Scientist
Develop, build, test, and operationalize
datastores and large-scale processing systems.
DataOps is the new DevOps.
Match architecture
with business needs.
Develop processes for
data modeling,
mining, and pipelines.
Improve data
reliability and quality.
Clean, massage, and organize data.
Perform statistics and analysis to develop
insights, build models, and search for innovative
correlations. Prepare data for
predictive models.
Explore data to find hidden
gems and patterns.
Tells stories to key
stakeholders.
Source:
Adapted from https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
DATA
Engineer
DATA
Scientist
SQL
Source:
Adapted from https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
IBM Watson
Studio
Conclusion
Call for action
• We always need more data
• Air Traffic @ https://github.com/jgperrin/ai.jgp.drsti-spark
• COVID-19 @ https://github.com/jgperrin/net.jgp.books.spark.ch99
• Go try & contribute to dṛṣṭi at http://jgp.ai/drsti
• Follow me on Twitter @jgperrin & YouTube /jgperrin
Key takeaways
• Spark is very fun & powerful for any data application:
• Data engineering
• Data science
• New vocabulary & concept regarding Apache Spark: dataframe, analytics
operating system
• Machine learning & AI work better with Big Data
• Data is fluid (and it’s really painful)
Thank you! http://jgp.ai/sia
See you next month
for All Things Open!

More Related Content

What's hot

How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6Maxime Beugnet
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2MongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipelinezahid-mian
 
SAS codes and tricks Comprehensive all codess
SAS codes and tricks Comprehensive all codessSAS codes and tricks Comprehensive all codess
SAS codes and tricks Comprehensive all codessrizrazariz
 
SAS codes and tricks Comprehensive all codes
SAS codes and tricks Comprehensive all codesSAS codes and tricks Comprehensive all codes
SAS codes and tricks Comprehensive all codesrizrazariz
 
My All Codes of SAS
My All Codes of SASMy All Codes of SAS
My All Codes of SASrizrazariz
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation Amit Ghosh
 
MongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima ApplicazioneMongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima ApplicazioneMassimo Brignoli
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
MySQL 8.0 Preview: What Is Coming?
MySQL 8.0 Preview: What Is Coming?MySQL 8.0 Preview: What Is Coming?
MySQL 8.0 Preview: What Is Coming?Gabriela Ferrara
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB
 

What's hot (17)

Report Statistical Analysis
Report Statistical AnalysisReport Statistical Analysis
Report Statistical Analysis
 
How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
 
SAS codes and tricks Comprehensive all codess
SAS codes and tricks Comprehensive all codessSAS codes and tricks Comprehensive all codess
SAS codes and tricks Comprehensive all codess
 
SAS codes and tricks Comprehensive all codes
SAS codes and tricks Comprehensive all codesSAS codes and tricks Comprehensive all codes
SAS codes and tricks Comprehensive all codes
 
My All Codes of SAS
My All Codes of SASMy All Codes of SAS
My All Codes of SAS
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
MongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima ApplicazioneMongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima Applicazione
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
MySQL 8.0 Preview: What Is Coming?
MySQL 8.0 Preview: What Is Coming?MySQL 8.0 Preview: What Is Coming?
MySQL 8.0 Preview: What Is Coming?
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
 

Similar to It's painful how much data rules the world

How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RHow to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RPaul Bradshaw
 
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Vivian S. Zhang
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache ArrowWes McKinney
 
Youth Tobacco Survey Analysis
Youth Tobacco Survey AnalysisYouth Tobacco Survey Analysis
Youth Tobacco Survey AnalysisRoshik Ganesan
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2goMoriyoshi Koizumi
 
R getting spatial
R getting spatialR getting spatial
R getting spatialFAO
 
Historical Finance Data
Historical Finance DataHistorical Finance Data
Historical Finance DataJEE HYUN PARK
 
Don't Repeat Yourself, and Automated Code Reviews
Don't Repeat Yourself, and Automated Code ReviewsDon't Repeat Yourself, and Automated Code Reviews
Don't Repeat Yourself, and Automated Code ReviewsGramener
 
Here is my current code- I need help visualizing my data using Python-.docx
Here is my current code- I need help visualizing my data using Python-.docxHere is my current code- I need help visualizing my data using Python-.docx
Here is my current code- I need help visualizing my data using Python-.docxJuliang56Parsonso
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB
 
Serverless Functions and Vue.js
Serverless Functions and Vue.jsServerless Functions and Vue.js
Serverless Functions and Vue.jsSarah Drasner
 
bbyopenApp_Code.DS_StorebbyopenApp_CodeVBCodeGoogleMaps.docx
bbyopenApp_Code.DS_StorebbyopenApp_CodeVBCodeGoogleMaps.docxbbyopenApp_Code.DS_StorebbyopenApp_CodeVBCodeGoogleMaps.docx
bbyopenApp_Code.DS_StorebbyopenApp_CodeVBCodeGoogleMaps.docxikirkton
 
Bitcoin Price Analysis (2014 - 2023).pdf
Bitcoin Price Analysis (2014 - 2023).pdfBitcoin Price Analysis (2014 - 2023).pdf
Bitcoin Price Analysis (2014 - 2023).pdfMukeshkanna24
 
Bridging Structured and Unstructred Data with Apache Hadoop and Vertica
Bridging Structured and Unstructred Data with Apache Hadoop and VerticaBridging Structured and Unstructred Data with Apache Hadoop and Vertica
Bridging Structured and Unstructred Data with Apache Hadoop and VerticaSteve Watt
 

Similar to It's painful how much data rules the world (20)

How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RHow to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in R
 
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Youth Tobacco Survey Analysis
Youth Tobacco Survey AnalysisYouth Tobacco Survey Analysis
Youth Tobacco Survey Analysis
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
R getting spatial
R getting spatialR getting spatial
R getting spatial
 
Historical Finance Data
Historical Finance DataHistorical Finance Data
Historical Finance Data
 
Don't Repeat Yourself, and Automated Code Reviews
Don't Repeat Yourself, and Automated Code ReviewsDon't Repeat Yourself, and Automated Code Reviews
Don't Repeat Yourself, and Automated Code Reviews
 
Here is my current code- I need help visualizing my data using Python-.docx
Here is my current code- I need help visualizing my data using Python-.docxHere is my current code- I need help visualizing my data using Python-.docx
Here is my current code- I need help visualizing my data using Python-.docx
 
A Shiny Example-- R
A Shiny Example-- RA Shiny Example-- R
A Shiny Example-- R
 
10. R getting spatial
10.  R getting spatial10.  R getting spatial
10. R getting spatial
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Books
BooksBooks
Books
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
 
Serverless Functions and Vue.js
Serverless Functions and Vue.jsServerless Functions and Vue.js
Serverless Functions and Vue.js
 
bbyopenApp_Code.DS_StorebbyopenApp_CodeVBCodeGoogleMaps.docx
bbyopenApp_Code.DS_StorebbyopenApp_CodeVBCodeGoogleMaps.docxbbyopenApp_Code.DS_StorebbyopenApp_CodeVBCodeGoogleMaps.docx
bbyopenApp_Code.DS_StorebbyopenApp_CodeVBCodeGoogleMaps.docx
 
Bitcoin Price Analysis (2014 - 2023).pdf
Bitcoin Price Analysis (2014 - 2023).pdfBitcoin Price Analysis (2014 - 2023).pdf
Bitcoin Price Analysis (2014 - 2023).pdf
 
Bridging Structured and Unstructred Data with Apache Hadoop and Vertica
Bridging Structured and Unstructred Data with Apache Hadoop and VerticaBridging Structured and Unstructred Data with Apache Hadoop and Vertica
Bridging Structured and Unstructred Data with Apache Hadoop and Vertica
 

More from Jean-Georges Perrin

The road to AI is paved with pragmatic intentions
The road to AI is paved with pragmatic intentionsThe road to AI is paved with pragmatic intentions
The road to AI is paved with pragmatic intentionsJean-Georges Perrin
 
Spark Summit Europe Wrap Up and TASM State of the Community
Spark Summit Europe Wrap Up and TASM State of the CommunitySpark Summit Europe Wrap Up and TASM State of the Community
Spark Summit Europe Wrap Up and TASM State of the CommunityJean-Georges Perrin
 
Spark hands-on tutorial (rev. 002)
Spark hands-on tutorial (rev. 002)Spark hands-on tutorial (rev. 002)
Spark hands-on tutorial (rev. 002)Jean-Georges Perrin
 
Spark Summit 2017 - A feedback for TASM
Spark Summit 2017 - A feedback for TASMSpark Summit 2017 - A feedback for TASM
Spark Summit 2017 - A feedback for TASMJean-Georges Perrin
 
HTML (or how the web got started)
HTML (or how the web got started)HTML (or how the web got started)
HTML (or how the web got started)Jean-Georges Perrin
 
2CRSI presentation for ISC-HPC: When High-Performance Computing meets High-Pe...
2CRSI presentation for ISC-HPC: When High-Performance Computing meets High-Pe...2CRSI presentation for ISC-HPC: When High-Performance Computing meets High-Pe...
2CRSI presentation for ISC-HPC: When High-Performance Computing meets High-Pe...Jean-Georges Perrin
 
Vision stratégique de l'utilisation de l'(Open)Data dans l'entreprise
Vision stratégique de l'utilisation de l'(Open)Data dans l'entrepriseVision stratégique de l'utilisation de l'(Open)Data dans l'entreprise
Vision stratégique de l'utilisation de l'(Open)Data dans l'entrepriseJean-Georges Perrin
 
Informix is not for legacy applications
Informix is not for legacy applicationsInformix is not for legacy applications
Informix is not for legacy applicationsJean-Georges Perrin
 
GreenIvory : products and services
GreenIvory : products and servicesGreenIvory : products and services
GreenIvory : products and servicesJean-Georges Perrin
 
GreenIvory : produits & services
GreenIvory : produits & servicesGreenIvory : produits & services
GreenIvory : produits & servicesJean-Georges Perrin
 
A la découverte des nouvelles tendances du web (Mulhouse Edition)
A la découverte des nouvelles tendances du web (Mulhouse Edition)A la découverte des nouvelles tendances du web (Mulhouse Edition)
A la découverte des nouvelles tendances du web (Mulhouse Edition)Jean-Georges Perrin
 
MashupXFeed et la stratégie éditoriale - Workshop Activis - GreenIvory
MashupXFeed et la stratégie éditoriale - Workshop Activis - GreenIvoryMashupXFeed et la stratégie éditoriale - Workshop Activis - GreenIvory
MashupXFeed et la stratégie éditoriale - Workshop Activis - GreenIvoryJean-Georges Perrin
 
MashupXFeed et le référencement - Workshop Activis - Greenivory
MashupXFeed et le référencement - Workshop Activis - GreenivoryMashupXFeed et le référencement - Workshop Activis - Greenivory
MashupXFeed et le référencement - Workshop Activis - GreenivoryJean-Georges Perrin
 

More from Jean-Georges Perrin (20)

Apache Spark v3.0.0
Apache Spark v3.0.0Apache Spark v3.0.0
Apache Spark v3.0.0
 
Big data made easy with a Spark
Big data made easy with a SparkBig data made easy with a Spark
Big data made easy with a Spark
 
Why i love Apache Spark?
Why i love Apache Spark?Why i love Apache Spark?
Why i love Apache Spark?
 
Big Data made easy with a Spark
Big Data made easy with a SparkBig Data made easy with a Spark
Big Data made easy with a Spark
 
The road to AI is paved with pragmatic intentions
The road to AI is paved with pragmatic intentionsThe road to AI is paved with pragmatic intentions
The road to AI is paved with pragmatic intentions
 
Spark Summit Europe Wrap Up and TASM State of the Community
Spark Summit Europe Wrap Up and TASM State of the CommunitySpark Summit Europe Wrap Up and TASM State of the Community
Spark Summit Europe Wrap Up and TASM State of the Community
 
Spark hands-on tutorial (rev. 002)
Spark hands-on tutorial (rev. 002)Spark hands-on tutorial (rev. 002)
Spark hands-on tutorial (rev. 002)
 
Spark Summit 2017 - A feedback for TASM
Spark Summit 2017 - A feedback for TASMSpark Summit 2017 - A feedback for TASM
Spark Summit 2017 - A feedback for TASM
 
HTML (or how the web got started)
HTML (or how the web got started)HTML (or how the web got started)
HTML (or how the web got started)
 
2CRSI presentation for ISC-HPC: When High-Performance Computing meets High-Pe...
2CRSI presentation for ISC-HPC: When High-Performance Computing meets High-Pe...2CRSI presentation for ISC-HPC: When High-Performance Computing meets High-Pe...
2CRSI presentation for ISC-HPC: When High-Performance Computing meets High-Pe...
 
Vision stratégique de l'utilisation de l'(Open)Data dans l'entreprise
Vision stratégique de l'utilisation de l'(Open)Data dans l'entrepriseVision stratégique de l'utilisation de l'(Open)Data dans l'entreprise
Vision stratégique de l'utilisation de l'(Open)Data dans l'entreprise
 
Informix is not for legacy applications
Informix is not for legacy applicationsInformix is not for legacy applications
Informix is not for legacy applications
 
Vendre des produits techniques
Vendre des produits techniquesVendre des produits techniques
Vendre des produits techniques
 
Vendre plus sur le web
Vendre plus sur le webVendre plus sur le web
Vendre plus sur le web
 
Vendre plus sur le Web
Vendre plus sur le WebVendre plus sur le Web
Vendre plus sur le Web
 
GreenIvory : products and services
GreenIvory : products and servicesGreenIvory : products and services
GreenIvory : products and services
 
GreenIvory : produits & services
GreenIvory : produits & servicesGreenIvory : produits & services
GreenIvory : produits & services
 
A la découverte des nouvelles tendances du web (Mulhouse Edition)
A la découverte des nouvelles tendances du web (Mulhouse Edition)A la découverte des nouvelles tendances du web (Mulhouse Edition)
A la découverte des nouvelles tendances du web (Mulhouse Edition)
 
MashupXFeed et la stratégie éditoriale - Workshop Activis - GreenIvory
MashupXFeed et la stratégie éditoriale - Workshop Activis - GreenIvoryMashupXFeed et la stratégie éditoriale - Workshop Activis - GreenIvory
MashupXFeed et la stratégie éditoriale - Workshop Activis - GreenIvory
 
MashupXFeed et le référencement - Workshop Activis - Greenivory
MashupXFeed et le référencement - Workshop Activis - GreenivoryMashupXFeed et le référencement - Workshop Activis - Greenivory
MashupXFeed et le référencement - Workshop Activis - Greenivory
 

Recently uploaded

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

It's painful how much data rules the world

  • 1. Jean-Georges Perrin • @jgperrin It's painful how much data rules the world All Things Open Meetup Raleigh Convention Center • Raleigh, NC September 15th 2021
  • 2. The opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of The NPD Group. The NPD Group does not guarantee the accuracy or reliability of the information provided herein.
  • 3. Jean-Georges “jgp" Perrin Software since 1983 >$0 1995 Big Data since 1984 >$0 2006 AI since 1994 >$0 2010 x13
  • 4.
  • 7.
  • 8. Find & process the data, not in Excel Display the data in a palatable form Source: Pexels
  • 9. Sources: Bureau of Transportation Statistics: https://www.transtats.bts.gov/TRAFFIC/
  • 10. +----------+----------------+-----------+-----+ |month |internationalPax|domesticPax|pax | +----------+----------------+-----------+-----+ |2000-01-01|5394 |41552 |46946| |2000-02-01|5249 |43724 |48973| |2000-03-01|6447 |52984 |59431| |2000-04-01|6062 |50349 |56411| |2000-05-01|6342 |52320 |58662| +----------+----------------+-----------+-----+ only showing top 5 rows root |-- month: date (nullable = true) |-- internationalPax: integer (nullable = true) |-- domesticPax: integer (nullable = true) |-- pax: integer (nullable = true) I !
  • 11. दृिष्ट • dṛṣṭi Open source, React & IBM Carbon-based data visualization framework Download at https://jgp.ai/drsti
  • 12.
  • 13. Apply light data quality Create a session Create a schema SparkSession spark = SparkSession.builder() .appName("CSV to Dataset") .master("local[*]") .getOrCreate(); StructType schema = DataTypes.createStructType(new StructField[] { DataTypes.createStructField(“month", DataTypes.DateType, false), DataTypes.createStructField(“pax", DataTypes.IntegerType, true) }); Dataset<Row> internationalPaxDf = spark.read().format("csv") .option("header", true) .option("dateFormat", "MMMM yyyy") .schema(schema) .load("data/bts/International USCarrier_Traffic_20210902163435.csv"); internationalPaxDf = internationalPaxDf .withColumnRenamed("pax", "internationalPax") .filter(col("month").isNotNull()) .filter(col("internationalPax").isNotNull()); Dataset<Row> domesticPaxDf = spark.read().format("csv") .option("header", true) .option("dateFormat", "MMMM yyyy") .schema(schema) .load("data/bts/Domestic USCarrier_Traffic_20210902163435.csv"); domesticPaxDf = domesticPaxDf .withColumnRenamed("pax", "domesticPax") .filter(col("month").isNotNull()) .filter(col("domesticPax").isNotNull()); /jgperrin/ai.jgp.drsti-spark Lab #300 Ingest international passengers Ingest domestic passengers
  • 14. Dataset<Row> df = internationalPaxDf .join(domesticPaxDf, internationalPaxDf.col(“month").equalTo(domesticPaxDf.col("month")), "outer") .withColumn("pax", expr("internationalPax + domesticPax")) .drop(domesticPaxDf.col("month")) .filter(col("month").$less(lit("2020-01-01").cast(DataTypes.DateType))) .orderBy(col("month")) .cache(); df = DrstiUtils.setHeader(df, "month", "Month of"); df = DrstiUtils.setHeader(df, "pax", "Passengers"); df = DrstiUtils.setHeader(df, "internationalPax", "International Passengers"); df = DrstiUtils.setHeader(df, "domesticPax", "Domestic Passengers"); DrstiChart d = new DrstiLineChart(df); d.setTitle("Air passenger traffic per month"); d.setXScale(DrstiK.SCALE_TIME); d.setXTitle("Period from " + DataframeUtils.min(df, "month") + " to “ + DataframeUtils.max(df, "month")); d.setYTitle("Passengers (000s)"); d.render(); /jgperrin/ai.jgp.drsti-spark Lab #300 All my data processing Add meta data directly to the dataframe Configure dṛṣṭi directly on the server
  • 15. Aren’t you glad we are using Java?
  • 16.
  • 17. Apps Analytics Distrib. Hardware OS Apps Hardware Hardware OS OS Distributed OS Analytics OS Apps Hardware Hardware OS OS An analytics operating system?
  • 18. Hardware Hardware OS OS Distributed OS Analytics OS Apps { An analytics operating system?
  • 19. Domestic passengers (CSV) International passengers (CSV) Domestic passengers (dataframe) International passengers (dataframe) Combining through an outer join Passengers (dataframe) Enhanced data (dataframe) Enhanced data (CSV) Visualization metadata (JSON) dṛṣṭi visualization Server processing through . Transfer Visualization Applying to our air traffic app
  • 21. Node 1 - OS Node 2 - OS Node 3 - OS Node 4 - OS Node 1 - HW Node 2 - HW Node 3 - HW Node 4 - HW Spark SQL Spark Streaming Spark MLlib Machine learning & artificial intelligence Spark GraphX Node 5 - OS Node 5 - HW Your application … … Unified API Node 6 - OS Node 6 - HW Node 7 - OS Node 7 - HW Node 8 - OS Node 8 - HW
  • 22. Spark SQL Spark Streaming Spark MLlib Machine learning & artificial intelligence Spark GraphX Your application Dataframe Unified API Node 1 - OS Node 2 - OS Node 3 - OS Node 4 - OS Node 5 - OS … Node 6 - OS Node 7 - OS Node 8 - OS
  • 23. Spark SQL Spark Streaming Spark MLlib Machine learning & artificial intelligence Spark GraphX Dataframe Source: Pexels
  • 24.
  • 25. SparkSession spark = SparkSession.builder() .appName("CSV to Dataset") .master("local[*]") .getOrCreate(); StructType schema = DataTypes.createStructType(new StructField[] { DataTypes.createStructField(“month", DataTypes.DateType, false), DataTypes.createStructField(“pax", DataTypes.IntegerType, true) }); Dataset<Row> internationalPaxDf = spark.read().format("csv") .option("header", true) .option("dateFormat", "MMMM yyyy") .schema(schema) .load("data/bts/International USCarrier_Traffic_20210902163435.csv"); internationalPaxDf = internationalPaxDf .withColumnRenamed("pax", "internationalPax") .filter(col("month").isNotNull()) .filter(col("internationalPax").isNotNull()); Dataset<Row> domesticPaxDf = spark.read().format("csv") .option("header", true) .option("dateFormat", "MMMM yyyy") .schema(schema) .load("data/bts/Domestic USCarrier_Traffic_20210902163435.csv"); domesticPaxDf = domesticPaxDf .withColumnRenamed("pax", "domesticPax") .filter(col("month").isNotNull()) .filter(col("domesticPax").isNotNull()); /jgperrin/ai.jgp.drsti-spark Lab #310
  • 26. Dataset<Row> df = internationalPaxDf .join(domesticPaxDf, internationalPaxDf.col("month") .equalTo(domesticPaxDf.col("month")), "outer") .withColumn("pax", expr("internationalPax + domesticPax")) .drop(domesticPaxDf.col("month")) .filter( col("month").$less(lit("2020-01-01").cast(DataTypes.DateType))) .orderBy(col("month")) .cache(); Dataset<Row> dfQuarter = df .withColumn("year", year(col("month"))) .withColumn("q", ceil(month(col("month")).$div(3))) .withColumn("period", concat(col("year"), lit("-Q"), col("q"))) .groupBy(col("period")) .agg(sum(“pax").as("pax"), sum("internationalPax").as("internationalPax"), sum("domesticPax").as("domesticPax")) .drop("year") .drop("q") .orderBy(col("period")); /jgperrin/ai.jgp.drsti-spark Lab #310 New code for quarter
  • 27.
  • 28. SparkSession spark = SparkSession.builder() .appName("CSV to Dataset") .master("local[*]") .getOrCreate(); StructType schema = DataTypes.createStructType(new StructField[] { DataTypes.createStructField(“month", DataTypes.DateType, false), DataTypes.createStructField(“pax", DataTypes.IntegerType, true) }); Dataset<Row> internationalPaxDf = spark.read().format("csv") .option("header", true) .option("dateFormat", "MMMM yyyy") .schema(schema) .load("data/bts/International USCarrier_Traffic_20210902163435.csv"); internationalPaxDf = internationalPaxDf .withColumnRenamed("pax", "internationalPax") .filter(col("month").isNotNull()) .filter(col("internationalPax").isNotNull()); Dataset<Row> domesticPaxDf = spark.read().format("csv") .option("header", true) .option("dateFormat", "MMMM yyyy") .schema(schema) .load("data/bts/Domestic USCarrier_Traffic_20210902163435.csv"); domesticPaxDf = domesticPaxDf .withColumnRenamed("pax", "domesticPax") .filter(col("month").isNotNull()) .filter(col("domesticPax").isNotNull()); /jgperrin/ai.jgp.drsti-spark Lab #320
  • 29. Dataset<Row> df = internationalPaxDf .join(domesticPaxDf, internationalPaxDf.col("month") .equalTo(domesticPaxDf.col("month")), "outer") .withColumn("pax", expr("internationalPax + domesticPax")) .drop(domesticPaxDf.col("month")) .filter( col("month").$less(lit("2020-01-01").cast(DataTypes.DateType))) .orderBy(col("month")) .cache(); Dataset<Row> dfYear = df .withColumn("year", year(col("month"))) .groupBy(col("year")) .agg(sum("pax").as("pax"), sum("internationalPax").as("internationalPax"), sum("domesticPax").as("domesticPax")) .orderBy(col("year")); /jgperrin/ai.jgp.drsti-spark Lab #320 New code for year
  • 30. Data Bronze Raw Data Silver Pure Data Gold Rich Data Actionable Data Application of Data Quality rules Ingestion Transfor- mation Publication “Cache” A (Big) Data Scenario Building a pipeline
  • 31. Data Bronze Raw Data Silver Pure Data Gold Rich Data Actionable Data Application of Data Quality rules Ingestion Transfor- mation Publication “Cache” A (Big) Data Scenario Building a pipeline
  • 32. // Combining datasets // ... df.write() .format("delta") .mode("overwrite") .save("./data/tmp/airtrafficmonth"); /jgperrin/ai.jgp.drsti-spark Lab #400 Saving to Delta Lake
  • 33. Dataset<Row> df = spark.read().format("delta") .load("./data/tmp/airtrafficmonth") .orderBy(col("month")); Dataset<Row> dfYear = df .withColumn("year", year(col("month"))) .groupBy(col("year")) .agg(sum(“pax").as("pax"), ... /jgperrin/ai.jgp.drsti-spark Lab #430 Reading from Delta Lake
  • 34. Can we project future traffic? Source: Comedy Central
  • 35. Do you remember January 2020? And March? Source: Pexels
  • 36. • Make a model for 2000-2019 • See the projection • Use 2020 data & imputation for the rest of 2021 • See the projection What now? Source: Pexels
  • 38. Use my model Split training & test data String[] inputCols = { "year" }; VectorAssembler assembler = new VectorAssembler().setInputCols(inputCols).setOutputCol("features"); df = assembler.transform(df); LinearRegression lr = new LinearRegression() .setMaxIter(10) .setRegParam(0.5) .setElasticNetParam(0.8) .setLabelCol("pax"); int threshold = 2019; Dataset<Row> trainingData = df.filter(col("year").$less$eq(threshold)); Dataset<Row> testData = df.filter(col("year").$greater(threshold)); LinearRegressionModel model = lr.fit(trainingData); Integer[] l = new Integer[] { 2020, 2021, 2022, 2023, 2024, 2025, 2026 }; List<Integer> data = Arrays.asList(l); Dataset<Row> futuresDf = spark.createDataset(data, Encoders.INT()).toDF().withColumnRenamed("value", "year"); assembler = new VectorAssembler().setInputCols(inputCols).setOutputCol("features"); futuresDf = assembler.transform(futuresDf); df = df.unionByName(futuresDf, true); df = model.transform(df); Features are a vector - let’s build one Build a linear regression Building my model /jgperrin/ai.jgp.drsti-spark Lab #500
  • 39. Something happened in 2020… Source: Pexels
  • 40. Something happened in 2020… Source: Pexels
  • 41. Label Feature Imputation Real data for 2021 Model 2000-2019 Model 2000-2021
  • 42. Connect now to dṛṣṭi http:/ /172.25.177.2:3000
  • 43. Dataset<Row> df2021 = df.filter(expr( "month >= TO_DATE('2021-01-01') and month <= TO_DATE('2021-12-31')")); int monthCount = (int) df2021.count(); df2021 = df2021 .agg(sum("pax").as("pax"), sum("internationalPax").as("internationalPax"), sum("domesticPax").as("domesticPax")); int pax = DataframeUtils.maxAsInt(df2021, "pax") / (12 - monthCount); int intPax = DataframeUtils.maxAsInt(df2021, “internationalPax") / (12 - monthCount); int domPax = DataframeUtils.maxAsInt(df2021, "domesticPax") / (12 - monthCount); List<String> data = new ArrayList(); for (int i = monthCount + 1; i <= 12; i++) { data.add("2021-" + i + "-01"); } Dataset<Row> dfImputation2021 = spark .createDataset(data, Encoders.STRING()).toDF() .withColumn("month", col("value").cast(DataTypes.DateType)) .withColumn("pax", lit(pax)) .withColumn("internationalPax", lit(intPax)) .withColumn("domesticPax", lit(domPax)) .drop("value"); Extract 2021 data /jgperrin/ai.jgp.drsti-spark Lab #600 Calculate imputation data Create a new dataframe, from scratch with the additional data
  • 44. LinearRegression lr = new LinearRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8).setLabelCol("pax"); LinearRegressionModel model2019 = lr.fit(df.filter(col(“year").$less$eq(2019))); df = model2019 .transform(df) .withColumnRenamed("prediction", "prediction2019"); LinearRegressionModel model2021 = lr.fit(df.filter(col("year").$less$eq(2021))); df = model2021 .transform(df) .withColumnRenamed("prediction", "prediction2021"); Pretty much the same code as lab #500, except: for renaming columns /jgperrin/ai.jgp.drsti-spark Lab #610 Reusing the same linear regression for both model, but the model is different!
  • 45. Same model Trainer Model Dataset #1 Model Dataset #2 Predicted Data Step 1: Learning phase Step 2..n: Predictive phase It’s all about the base model
  • 47. There are two kinds of data scientists: 1) Those who can extrapolate from incomplete data.
  • 48. DATA Engineer DATA Scientist Develop, build, test, and operationalize datastores and large-scale processing systems. DataOps is the new DevOps. Match architecture with business needs. Develop processes for data modeling, mining, and pipelines. Improve data reliability and quality. Clean, massage, and organize data. Perform statistics and analysis to develop insights, build models, and search for innovative correlations. Prepare data for predictive models. Explore data to find hidden gems and patterns. Tells stories to key stakeholders. Source: Adapted from https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
  • 51. Call for action • We always need more data • Air Traffic @ https://github.com/jgperrin/ai.jgp.drsti-spark • COVID-19 @ https://github.com/jgperrin/net.jgp.books.spark.ch99 • Go try & contribute to dṛṣṭi at http://jgp.ai/drsti • Follow me on Twitter @jgperrin & YouTube /jgperrin
  • 52. Key takeaways • Spark is very fun & powerful for any data application: • Data engineering • Data science • New vocabulary & concept regarding Apache Spark: dataframe, analytics operating system • Machine learning & AI work better with Big Data • Data is fluid (and it’s really painful)
  • 53. Thank you! http://jgp.ai/sia See you next month for All Things Open!