SlideShare a Scribd company logo
P r o c e s s i n g B i g D a t a P r e d i c t i v e A n a l y t i c s
PyATL Meetup 10/18/2015
‹#›
W h o A m I
• Roy Russo
• VP Engineering, Predikto
‹#›
W h y A m I H e r e ?
&
(Big Data) Predictive Analytics
‹#›
A g e n d a
• What we do
• Problems we faced
• How we solved them
• Rationale
• What’s good. What’s not.
‹#›
W h o i s P r e d i k t o ?
• Atlanta-based
• Founded in 2012
• Funded
• Paying Customers
• Mechanical Engineers
• Big Data Architects
• Global 1000
… and we don’t suck.
‹#›
W h a t w e d o ?
• Actionable Predictive Maintenance
• Predictive Analytics
• Real-time health scoring
• Unified asset health view
• SaaS
H O W D O E S P R E D I C T I V E
A N A L Y T I C S W O R K ?
‹#›
D a t a S o u r c e s
‹#›
H o w W e D o I t
TRAIN MGT
SYSTEMS
INTELLITRAIN
GE RM&D
EMD
NYAB LEADER
MOTIVEPOWE
R
WABTEC
EAM
SAP
INFOR
ORACLE
MAXIMO
OTHER
BEACONS
WILD
WEATHER
TCIS
CUSTOM APPS
UMLER
TIME-BASED
ACTIONABLE
PREDICTIONS
PREDIKTO ENTERPRISE PLATFORM
PREDIKTO
INPUT
API’S
DATA
TRANSFORM.
ENGINE
MAX
MACHINE
LEARNING
ENGINE
PREDIKTO
OUTPUT
APIS
&
DASHBOARDS
‹#›
T h e P i p e l i n e
Standard JSON
AutoDynamic Feature
Engineering
AutoDynamic Feature
Selection
ETL
Email
SMS
Integration
UI Data Store
PrediktodataPipeline
“MAX”
OperationalIntegration
Data Aggregation/ETL Machine Learning/Analytics Outbound APIs/Integration
‹#›
H i g h - L e v e l R e q u i r e m e n t s
• ETL on LARGE datasets
• Fast
• Commodity hardware
• Feature scoring/selection on LARGE datasets
• Scale horizontally
• Runs from Instruction-set
• Visualize LARGE datasets
• Time-Series
• Fast
• Commodity hardware
• Scale horizontally
• Support dynamic querying
• Differing Schema
Data Processing
Data Querying
‹#›
W h y S p a r k ?
• ETL:
• Shared Memory
• Not Disk-Bound
• Distributed workloads
• Feature scoring/selection on LARGE datasets
• Same as above
• Scale horizontally
• New node = more capacity
• Spin up. Spin down.
• Runs from Instruction-set
• DAGs
• Python Devs… PySpark
‹#›
I m p l e m e n t i n g S p a r k
• Use DAGS
• Directed Acyclic Graphs
• Config-Driven Workflows
• Tune to Job
• Workers / Core
• Memory Tuning
• CPU or RAM ?
• Cons:
• Steep learning curve
• Dev & Ops
• Documentation
• Exception handling
• Native Scala
‹#›
S p a r k U I
‹#›
M o r e o n D A G s
class comma_to_decimal(BaseDagTask):
@staticmethod
def run(sc, config, rdds, log):
from shippable.dag_tasks._utils import safe_map
out = []
for idx, rdd in enumerate(rdds):
if rdd is not None:
cols = [str(x).strip() for x in config['cols'].split(',')]
rdd = safe_map(rdd, lambda x: {k: v if str(k) not in cols else str(v).replace('.', '').replace(',', '.') for k, v in x.items()}, log)
out.append(rdd)
else:
out.append(None)
return out, None
"datetimes_ones": {
"after": "datatypes_ones",
"type": "convert_datetimes",
},
“comma_convert": {
"after": "datetimes_ones",
"type": “comma_to_decimals”,
},
"parentids_ones": {
"after": "devids_ones",
"type": "comma_convert",
},
"timestamps_ones": {
"after": "parentids_ones",
"type": "rename",
},
‹#›
W h y E l a s t i c s e a r c h ?
• Time-Series Data
• Fast-Reads
• Fast writes with Bulk Inserts
• Asynch
• Dynamic querying
• Differing schema
• Everything is indexed
• Visualization
• GeoJSON
• Scale horizontally
• New node = more capacity
• Spark-ES Connector
• REST API & Python lib
‹#›
E S - H a d o o p
client = Elasticsearch(hosts=[es_uri])
query = '{"query": {"filtered": {"filter": {"terms": {"date_epoch": ' + some_var +'}}}}}'
es_conf = {
"es.nodes": es_host_name,
"es.port": es_host_port,
"es.resource": es_index + '/' + es_mapping,
"es.query": query
}
es_RDD = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_conf)
dicts = es_RDD.map(lambda x: x[1]).map(lambda x: reformat_location(x))
‹#›
I m p l e m e n t i n g E l a s t i c s e a r c h
• Tune
• Memory-Bound
• Large Drives (SSDs)
• Front w/ Load Balancer
• Spark
• Schema:
• Understand your customer’s data!
• Typing is important
• Timestamps, GeoHASH, Floats, etc…
• Cons:
• Moderate learning curve
• Query syntax
• Security
‹#›
A r c h i t e c t u r e
‹#›
I m p l e m e n t i n g E l a s t i c s e a r c h
P R E D I C T | P R E V E N T | P E R F O R M
http://www.predikto.com/company/careers

More Related Content

Viewers also liked

Myxomycetes
MyxomycetesMyxomycetes
Aleksandar sahatchiev-2014.eng-1
Aleksandar sahatchiev-2014.eng-1Aleksandar sahatchiev-2014.eng-1
Aleksandar sahatchiev-2014.eng-1Sim Aleksiev
 
Luxfin 2020
Luxfin 2020Luxfin 2020
Luxfin 2020
Thierry Raizer
 
בינוי ודיור Construction and Housing
בינוי ודיור   Construction and Housingבינוי ודיור   Construction and Housing
My seasonal ritual
My seasonal ritualMy seasonal ritual
Basicsonhinduism
BasicsonhinduismBasicsonhinduism
BasicsonhinduismLee Eddy
 
Ivailo dimitrov-2014
Ivailo dimitrov-2014Ivailo dimitrov-2014
Ivailo dimitrov-2014Sim Aleksiev
 
Ролевой коучинг в ситуации кризиса и санкций.
Ролевой коучинг в ситуации кризиса и санкций.Ролевой коучинг в ситуации кризиса и санкций.
Ролевой коучинг в ситуации кризиса и санкций.
Елена Виль-Вильямс
 
Topik 7 blog
Topik 7   blogTopik 7   blog
Topik 7 blog
Champ14n
 
Hodgkin's Lymphoma
Hodgkin's LymphomaHodgkin's Lymphoma
Hodgkin's Lymphoma
spa718
 
Informatica solidale - cooperazione internazionale 9 aprile 2015
Informatica solidale - cooperazione internazionale 9 aprile 2015Informatica solidale - cooperazione internazionale 9 aprile 2015
Informatica solidale - cooperazione internazionale 9 aprile 2015
Claudio Tancini
 
Atanas moskov-2014
Atanas moskov-2014Atanas moskov-2014
Atanas moskov-2014Sim Aleksiev
 
Senior Business Analyst
Senior Business AnalystSenior Business Analyst
Senior Business Analyst
Tony Blossingham
 
Sequestro chinês
Sequestro chinêsSequestro chinês
Sequestro chinês
Do outro lado da barricada
 
Likheter Rom då och idag
Likheter Rom då och idagLikheter Rom då och idag
Likheter Rom då och idag
liisamurphy
 
Filosofia medieval -_Alfredo_Storck
Filosofia medieval -_Alfredo_StorckFilosofia medieval -_Alfredo_Storck
Filosofia medieval -_Alfredo_Storck
filosofia medieval
 
Ars h and n cancer update หลัง lunch
Ars h and n cancer update หลัง lunchArs h and n cancer update หลัง lunch
Ars h and n cancer update หลัง lunchspa718
 
Stefan baltov-2014-1
Stefan baltov-2014-1Stefan baltov-2014-1
Stefan baltov-2014-1
Sim Aleksiev
 
De como uma cassete com frases comprometedoras de militares portugueses condi...
De como uma cassete com frases comprometedoras de militares portugueses condi...De como uma cassete com frases comprometedoras de militares portugueses condi...
De como uma cassete com frases comprometedoras de militares portugueses condi...
Do outro lado da barricada
 

Viewers also liked (20)

Myxomycetes
MyxomycetesMyxomycetes
Myxomycetes
 
Anri kulev-2014-1
Anri kulev-2014-1Anri kulev-2014-1
Anri kulev-2014-1
 
Aleksandar sahatchiev-2014.eng-1
Aleksandar sahatchiev-2014.eng-1Aleksandar sahatchiev-2014.eng-1
Aleksandar sahatchiev-2014.eng-1
 
Luxfin 2020
Luxfin 2020Luxfin 2020
Luxfin 2020
 
בינוי ודיור Construction and Housing
בינוי ודיור   Construction and Housingבינוי ודיור   Construction and Housing
בינוי ודיור Construction and Housing
 
My seasonal ritual
My seasonal ritualMy seasonal ritual
My seasonal ritual
 
Basicsonhinduism
BasicsonhinduismBasicsonhinduism
Basicsonhinduism
 
Ivailo dimitrov-2014
Ivailo dimitrov-2014Ivailo dimitrov-2014
Ivailo dimitrov-2014
 
Ролевой коучинг в ситуации кризиса и санкций.
Ролевой коучинг в ситуации кризиса и санкций.Ролевой коучинг в ситуации кризиса и санкций.
Ролевой коучинг в ситуации кризиса и санкций.
 
Topik 7 blog
Topik 7   blogTopik 7   blog
Topik 7 blog
 
Hodgkin's Lymphoma
Hodgkin's LymphomaHodgkin's Lymphoma
Hodgkin's Lymphoma
 
Informatica solidale - cooperazione internazionale 9 aprile 2015
Informatica solidale - cooperazione internazionale 9 aprile 2015Informatica solidale - cooperazione internazionale 9 aprile 2015
Informatica solidale - cooperazione internazionale 9 aprile 2015
 
Atanas moskov-2014
Atanas moskov-2014Atanas moskov-2014
Atanas moskov-2014
 
Senior Business Analyst
Senior Business AnalystSenior Business Analyst
Senior Business Analyst
 
Sequestro chinês
Sequestro chinêsSequestro chinês
Sequestro chinês
 
Likheter Rom då och idag
Likheter Rom då och idagLikheter Rom då och idag
Likheter Rom då och idag
 
Filosofia medieval -_Alfredo_Storck
Filosofia medieval -_Alfredo_StorckFilosofia medieval -_Alfredo_Storck
Filosofia medieval -_Alfredo_Storck
 
Ars h and n cancer update หลัง lunch
Ars h and n cancer update หลัง lunchArs h and n cancer update หลัง lunch
Ars h and n cancer update หลัง lunch
 
Stefan baltov-2014-1
Stefan baltov-2014-1Stefan baltov-2014-1
Stefan baltov-2014-1
 
De como uma cassete com frases comprometedoras de militares portugueses condi...
De como uma cassete com frases comprometedoras de militares portugueses condi...De como uma cassete com frases comprometedoras de militares portugueses condi...
De como uma cassete com frases comprometedoras de militares portugueses condi...
 

Similar to PyATL Meetup, Oct 8, 2015

Elasticsearch Atlanta Meetup 3/15/16
Elasticsearch Atlanta Meetup 3/15/16Elasticsearch Atlanta Meetup 3/15/16
Elasticsearch Atlanta Meetup 3/15/16
Roy Russo
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Lviv Startup Club
 
Trending with Purpose
Trending with PurposeTrending with Purpose
Trending with Purpose
Jason Dixon
 
Building a Microservices-based ERP System
Building a Microservices-based ERP SystemBuilding a Microservices-based ERP System
Building a Microservices-based ERP System
MongoDB
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
Kaushik Paranjape
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
 
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_WilkinsMongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
kiwilkins
 
Blazing Data With Redis (and LEGOS!)
Blazing Data With Redis (and LEGOS!)Blazing Data With Redis (and LEGOS!)
Blazing Data With Redis (and LEGOS!)
Justin Carmony
 
MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoring
spil-engineering
 
Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)
GoDataDriven
 
Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitware
bigdataviz_bay
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
IBM Cloud Data Services
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
SATOSHI TAGOMORI
 
What are you waiting for
What are you waiting forWhat are you waiting for
What are you waiting forJason Strate
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
Alexis Seigneurin
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics
Sean Forgatch
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
Craig Jordan
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Databricks
 
Lightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataLightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional data
Pavel Hardak
 
SQL vs NoSQL
SQL vs NoSQLSQL vs NoSQL
SQL vs NoSQL
Jacinto Limjap
 

Similar to PyATL Meetup, Oct 8, 2015 (20)

Elasticsearch Atlanta Meetup 3/15/16
Elasticsearch Atlanta Meetup 3/15/16Elasticsearch Atlanta Meetup 3/15/16
Elasticsearch Atlanta Meetup 3/15/16
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
 
Trending with Purpose
Trending with PurposeTrending with Purpose
Trending with Purpose
 
Building a Microservices-based ERP System
Building a Microservices-based ERP SystemBuilding a Microservices-based ERP System
Building a Microservices-based ERP System
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_WilkinsMongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
 
Blazing Data With Redis (and LEGOS!)
Blazing Data With Redis (and LEGOS!)Blazing Data With Redis (and LEGOS!)
Blazing Data With Redis (and LEGOS!)
 
MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoring
 
Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)
 
Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitware
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
What are you waiting for
What are you waiting forWhat are you waiting for
What are you waiting for
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
Lightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataLightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional data
 
SQL vs NoSQL
SQL vs NoSQLSQL vs NoSQL
SQL vs NoSQL
 

More from Roy Russo

Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
Roy Russo
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
Roy Russo
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
Roy Russo
 
Introduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupIntroduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users Group
Roy Russo
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
Roy Russo
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
Roy Russo
 
Ajug hibernate-dos-donts
Ajug hibernate-dos-dontsAjug hibernate-dos-donts
Ajug hibernate-dos-donts
Roy Russo
 

More from Roy Russo (7)

Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Introduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupIntroduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users Group
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Ajug hibernate-dos-donts
Ajug hibernate-dos-dontsAjug hibernate-dos-donts
Ajug hibernate-dos-donts
 

Recently uploaded

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 

Recently uploaded (20)

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 

PyATL Meetup, Oct 8, 2015

  • 1. P r o c e s s i n g B i g D a t a P r e d i c t i v e A n a l y t i c s PyATL Meetup 10/18/2015
  • 2. ‹#› W h o A m I • Roy Russo • VP Engineering, Predikto
  • 3. ‹#› W h y A m I H e r e ? & (Big Data) Predictive Analytics
  • 4. ‹#› A g e n d a • What we do • Problems we faced • How we solved them • Rationale • What’s good. What’s not.
  • 5. ‹#› W h o i s P r e d i k t o ? • Atlanta-based • Founded in 2012 • Funded • Paying Customers • Mechanical Engineers • Big Data Architects • Global 1000 … and we don’t suck.
  • 6. ‹#› W h a t w e d o ? • Actionable Predictive Maintenance • Predictive Analytics • Real-time health scoring • Unified asset health view • SaaS
  • 7. H O W D O E S P R E D I C T I V E A N A L Y T I C S W O R K ?
  • 8. ‹#› D a t a S o u r c e s
  • 9. ‹#› H o w W e D o I t TRAIN MGT SYSTEMS INTELLITRAIN GE RM&D EMD NYAB LEADER MOTIVEPOWE R WABTEC EAM SAP INFOR ORACLE MAXIMO OTHER BEACONS WILD WEATHER TCIS CUSTOM APPS UMLER TIME-BASED ACTIONABLE PREDICTIONS PREDIKTO ENTERPRISE PLATFORM PREDIKTO INPUT API’S DATA TRANSFORM. ENGINE MAX MACHINE LEARNING ENGINE PREDIKTO OUTPUT APIS & DASHBOARDS
  • 10. ‹#› T h e P i p e l i n e Standard JSON AutoDynamic Feature Engineering AutoDynamic Feature Selection ETL Email SMS Integration UI Data Store PrediktodataPipeline “MAX” OperationalIntegration Data Aggregation/ETL Machine Learning/Analytics Outbound APIs/Integration
  • 11. ‹#› H i g h - L e v e l R e q u i r e m e n t s • ETL on LARGE datasets • Fast • Commodity hardware • Feature scoring/selection on LARGE datasets • Scale horizontally • Runs from Instruction-set • Visualize LARGE datasets • Time-Series • Fast • Commodity hardware • Scale horizontally • Support dynamic querying • Differing Schema Data Processing Data Querying
  • 12. ‹#› W h y S p a r k ? • ETL: • Shared Memory • Not Disk-Bound • Distributed workloads • Feature scoring/selection on LARGE datasets • Same as above • Scale horizontally • New node = more capacity • Spin up. Spin down. • Runs from Instruction-set • DAGs • Python Devs… PySpark
  • 13. ‹#› I m p l e m e n t i n g S p a r k • Use DAGS • Directed Acyclic Graphs • Config-Driven Workflows • Tune to Job • Workers / Core • Memory Tuning • CPU or RAM ? • Cons: • Steep learning curve • Dev & Ops • Documentation • Exception handling • Native Scala
  • 14. ‹#› S p a r k U I
  • 15. ‹#› M o r e o n D A G s class comma_to_decimal(BaseDagTask): @staticmethod def run(sc, config, rdds, log): from shippable.dag_tasks._utils import safe_map out = [] for idx, rdd in enumerate(rdds): if rdd is not None: cols = [str(x).strip() for x in config['cols'].split(',')] rdd = safe_map(rdd, lambda x: {k: v if str(k) not in cols else str(v).replace('.', '').replace(',', '.') for k, v in x.items()}, log) out.append(rdd) else: out.append(None) return out, None "datetimes_ones": { "after": "datatypes_ones", "type": "convert_datetimes", }, “comma_convert": { "after": "datetimes_ones", "type": “comma_to_decimals”, }, "parentids_ones": { "after": "devids_ones", "type": "comma_convert", }, "timestamps_ones": { "after": "parentids_ones", "type": "rename", },
  • 16. ‹#› W h y E l a s t i c s e a r c h ? • Time-Series Data • Fast-Reads • Fast writes with Bulk Inserts • Asynch • Dynamic querying • Differing schema • Everything is indexed • Visualization • GeoJSON • Scale horizontally • New node = more capacity • Spark-ES Connector • REST API & Python lib
  • 17. ‹#› E S - H a d o o p client = Elasticsearch(hosts=[es_uri]) query = '{"query": {"filtered": {"filter": {"terms": {"date_epoch": ' + some_var +'}}}}}' es_conf = { "es.nodes": es_host_name, "es.port": es_host_port, "es.resource": es_index + '/' + es_mapping, "es.query": query } es_RDD = sc.newAPIHadoopRDD( inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat", keyClass="org.apache.hadoop.io.NullWritable", valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=es_conf) dicts = es_RDD.map(lambda x: x[1]).map(lambda x: reformat_location(x))
  • 18. ‹#› I m p l e m e n t i n g E l a s t i c s e a r c h • Tune • Memory-Bound • Large Drives (SSDs) • Front w/ Load Balancer • Spark • Schema: • Understand your customer’s data! • Typing is important • Timestamps, GeoHASH, Floats, etc… • Cons: • Moderate learning curve • Query syntax • Security
  • 19. ‹#› A r c h i t e c t u r e
  • 20. ‹#› I m p l e m e n t i n g E l a s t i c s e a r c h
  • 21. P R E D I C T | P R E V E N T | P E R F O R M http://www.predikto.com/company/careers