SlideShare a Scribd company logo
1 of 21
P r o c e s s i n g B i g D a t a P r e d i c t i v e A n a l y t i c s
PyATL Meetup 10/18/2015
‹#›
W h o A m I
• Roy Russo
• VP Engineering, Predikto
‹#›
W h y A m I H e r e ?
&
(Big Data) Predictive Analytics
‹#›
A g e n d a
• What we do
• Problems we faced
• How we solved them
• Rationale
• What’s good. What’s not.
‹#›
W h o i s P r e d i k t o ?
• Atlanta-based
• Founded in 2012
• Funded
• Paying Customers
• Mechanical Engineers
• Big Data Architects
• Global 1000
… and we don’t suck.
‹#›
W h a t w e d o ?
• Actionable Predictive Maintenance
• Predictive Analytics
• Real-time health scoring
• Unified asset health view
• SaaS
H O W D O E S P R E D I C T I V E
A N A L Y T I C S W O R K ?
‹#›
D a t a S o u r c e s
‹#›
H o w W e D o I t
TRAIN MGT
SYSTEMS
INTELLITRAIN
GE RM&D
EMD
NYAB LEADER
MOTIVEPOWE
R
WABTEC
EAM
SAP
INFOR
ORACLE
MAXIMO
OTHER
BEACONS
WILD
WEATHER
TCIS
CUSTOM APPS
UMLER
TIME-BASED
ACTIONABLE
PREDICTIONS
PREDIKTO ENTERPRISE PLATFORM
PREDIKTO
INPUT
API’S
DATA
TRANSFORM.
ENGINE
MAX
MACHINE
LEARNING
ENGINE
PREDIKTO
OUTPUT
APIS
&
DASHBOARDS
‹#›
T h e P i p e l i n e
Standard JSON
AutoDynamic Feature
Engineering
AutoDynamic Feature
Selection
ETL
Email
SMS
Integration
UI Data Store
PrediktodataPipeline
“MAX”
OperationalIntegration
Data Aggregation/ETL Machine Learning/Analytics Outbound APIs/Integration
‹#›
H i g h - L e v e l R e q u i r e m e n t s
• ETL on LARGE datasets
• Fast
• Commodity hardware
• Feature scoring/selection on LARGE datasets
• Scale horizontally
• Runs from Instruction-set
• Visualize LARGE datasets
• Time-Series
• Fast
• Commodity hardware
• Scale horizontally
• Support dynamic querying
• Differing Schema
Data Processing
Data Querying
‹#›
W h y S p a r k ?
• ETL:
• Shared Memory
• Not Disk-Bound
• Distributed workloads
• Feature scoring/selection on LARGE datasets
• Same as above
• Scale horizontally
• New node = more capacity
• Spin up. Spin down.
• Runs from Instruction-set
• DAGs
• Python Devs… PySpark
‹#›
I m p l e m e n t i n g S p a r k
• Use DAGS
• Directed Acyclic Graphs
• Config-Driven Workflows
• Tune to Job
• Workers / Core
• Memory Tuning
• CPU or RAM ?
• Cons:
• Steep learning curve
• Dev & Ops
• Documentation
• Exception handling
• Native Scala
‹#›
S p a r k U I
‹#›
M o r e o n D A G s
class comma_to_decimal(BaseDagTask):
@staticmethod
def run(sc, config, rdds, log):
from shippable.dag_tasks._utils import safe_map
out = []
for idx, rdd in enumerate(rdds):
if rdd is not None:
cols = [str(x).strip() for x in config['cols'].split(',')]
rdd = safe_map(rdd, lambda x: {k: v if str(k) not in cols else str(v).replace('.', '').replace(',', '.') for k, v in x.items()}, log)
out.append(rdd)
else:
out.append(None)
return out, None
"datetimes_ones": {
"after": "datatypes_ones",
"type": "convert_datetimes",
},
“comma_convert": {
"after": "datetimes_ones",
"type": “comma_to_decimals”,
},
"parentids_ones": {
"after": "devids_ones",
"type": "comma_convert",
},
"timestamps_ones": {
"after": "parentids_ones",
"type": "rename",
},
‹#›
W h y E l a s t i c s e a r c h ?
• Time-Series Data
• Fast-Reads
• Fast writes with Bulk Inserts
• Asynch
• Dynamic querying
• Differing schema
• Everything is indexed
• Visualization
• GeoJSON
• Scale horizontally
• New node = more capacity
• Spark-ES Connector
• REST API & Python lib
‹#›
E S - H a d o o p
client = Elasticsearch(hosts=[es_uri])
query = '{"query": {"filtered": {"filter": {"terms": {"date_epoch": ' + some_var +'}}}}}'
es_conf = {
"es.nodes": es_host_name,
"es.port": es_host_port,
"es.resource": es_index + '/' + es_mapping,
"es.query": query
}
es_RDD = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_conf)
dicts = es_RDD.map(lambda x: x[1]).map(lambda x: reformat_location(x))
‹#›
I m p l e m e n t i n g E l a s t i c s e a r c h
• Tune
• Memory-Bound
• Large Drives (SSDs)
• Front w/ Load Balancer
• Spark
• Schema:
• Understand your customer’s data!
• Typing is important
• Timestamps, GeoHASH, Floats, etc…
• Cons:
• Moderate learning curve
• Query syntax
• Security
‹#›
A r c h i t e c t u r e
‹#›
I m p l e m e n t i n g E l a s t i c s e a r c h
P R E D I C T | P R E V E N T | P E R F O R M
http://www.predikto.com/company/careers

More Related Content

Viewers also liked

Aleksandar sahatchiev-2014.eng-1
Aleksandar sahatchiev-2014.eng-1Aleksandar sahatchiev-2014.eng-1
Aleksandar sahatchiev-2014.eng-1Sim Aleksiev
 
Basicsonhinduism
BasicsonhinduismBasicsonhinduism
BasicsonhinduismLee Eddy
 
Ivailo dimitrov-2014
Ivailo dimitrov-2014Ivailo dimitrov-2014
Ivailo dimitrov-2014Sim Aleksiev
 
Ролевой коучинг в ситуации кризиса и санкций.
Ролевой коучинг в ситуации кризиса и санкций.Ролевой коучинг в ситуации кризиса и санкций.
Ролевой коучинг в ситуации кризиса и санкций.Елена Виль-Вильямс
 
Topik 7 blog
Topik 7   blogTopik 7   blog
Topik 7 blogChamp14n
 
Hodgkin's Lymphoma
Hodgkin's LymphomaHodgkin's Lymphoma
Hodgkin's Lymphomaspa718
 
Informatica solidale - cooperazione internazionale 9 aprile 2015
Informatica solidale - cooperazione internazionale 9 aprile 2015Informatica solidale - cooperazione internazionale 9 aprile 2015
Informatica solidale - cooperazione internazionale 9 aprile 2015Claudio Tancini
 
Atanas moskov-2014
Atanas moskov-2014Atanas moskov-2014
Atanas moskov-2014Sim Aleksiev
 
Likheter Rom då och idag
Likheter Rom då och idagLikheter Rom då och idag
Likheter Rom då och idagliisamurphy
 
Filosofia medieval -_Alfredo_Storck
Filosofia medieval -_Alfredo_StorckFilosofia medieval -_Alfredo_Storck
Filosofia medieval -_Alfredo_Storckfilosofia medieval
 
Ars h and n cancer update หลัง lunch
Ars h and n cancer update หลัง lunchArs h and n cancer update หลัง lunch
Ars h and n cancer update หลัง lunchspa718
 
Stefan baltov-2014-1
Stefan baltov-2014-1Stefan baltov-2014-1
Stefan baltov-2014-1Sim Aleksiev
 
De como uma cassete com frases comprometedoras de militares portugueses condi...
De como uma cassete com frases comprometedoras de militares portugueses condi...De como uma cassete com frases comprometedoras de militares portugueses condi...
De como uma cassete com frases comprometedoras de militares portugueses condi...Do outro lado da barricada
 

Viewers also liked (20)

Myxomycetes
MyxomycetesMyxomycetes
Myxomycetes
 
Anri kulev-2014-1
Anri kulev-2014-1Anri kulev-2014-1
Anri kulev-2014-1
 
Aleksandar sahatchiev-2014.eng-1
Aleksandar sahatchiev-2014.eng-1Aleksandar sahatchiev-2014.eng-1
Aleksandar sahatchiev-2014.eng-1
 
Luxfin 2020
Luxfin 2020Luxfin 2020
Luxfin 2020
 
בינוי ודיור Construction and Housing
בינוי ודיור   Construction and Housingבינוי ודיור   Construction and Housing
בינוי ודיור Construction and Housing
 
My seasonal ritual
My seasonal ritualMy seasonal ritual
My seasonal ritual
 
Basicsonhinduism
BasicsonhinduismBasicsonhinduism
Basicsonhinduism
 
Ivailo dimitrov-2014
Ivailo dimitrov-2014Ivailo dimitrov-2014
Ivailo dimitrov-2014
 
Ролевой коучинг в ситуации кризиса и санкций.
Ролевой коучинг в ситуации кризиса и санкций.Ролевой коучинг в ситуации кризиса и санкций.
Ролевой коучинг в ситуации кризиса и санкций.
 
Topik 7 blog
Topik 7   blogTopik 7   blog
Topik 7 blog
 
Hodgkin's Lymphoma
Hodgkin's LymphomaHodgkin's Lymphoma
Hodgkin's Lymphoma
 
Informatica solidale - cooperazione internazionale 9 aprile 2015
Informatica solidale - cooperazione internazionale 9 aprile 2015Informatica solidale - cooperazione internazionale 9 aprile 2015
Informatica solidale - cooperazione internazionale 9 aprile 2015
 
Atanas moskov-2014
Atanas moskov-2014Atanas moskov-2014
Atanas moskov-2014
 
Senior Business Analyst
Senior Business AnalystSenior Business Analyst
Senior Business Analyst
 
Sequestro chinês
Sequestro chinêsSequestro chinês
Sequestro chinês
 
Likheter Rom då och idag
Likheter Rom då och idagLikheter Rom då och idag
Likheter Rom då och idag
 
Filosofia medieval -_Alfredo_Storck
Filosofia medieval -_Alfredo_StorckFilosofia medieval -_Alfredo_Storck
Filosofia medieval -_Alfredo_Storck
 
Ars h and n cancer update หลัง lunch
Ars h and n cancer update หลัง lunchArs h and n cancer update หลัง lunch
Ars h and n cancer update หลัง lunch
 
Stefan baltov-2014-1
Stefan baltov-2014-1Stefan baltov-2014-1
Stefan baltov-2014-1
 
De como uma cassete com frases comprometedoras de militares portugueses condi...
De como uma cassete com frases comprometedoras de militares portugueses condi...De como uma cassete com frases comprometedoras de militares portugueses condi...
De como uma cassete com frases comprometedoras de militares portugueses condi...
 

Similar to PyATL Meetup, Oct 8, 2015

Elasticsearch Atlanta Meetup 3/15/16
Elasticsearch Atlanta Meetup 3/15/16Elasticsearch Atlanta Meetup 3/15/16
Elasticsearch Atlanta Meetup 3/15/16Roy Russo
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Lviv Startup Club
 
Trending with Purpose
Trending with PurposeTrending with Purpose
Trending with PurposeJason Dixon
 
Building a Microservices-based ERP System
Building a Microservices-based ERP SystemBuilding a Microservices-based ERP System
Building a Microservices-based ERP SystemMongoDB
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_WilkinsMongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkinskiwilkins
 
Blazing Data With Redis (and LEGOS!)
Blazing Data With Redis (and LEGOS!)Blazing Data With Redis (and LEGOS!)
Blazing Data With Redis (and LEGOS!)Justin Carmony
 
MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoringspil-engineering
 
Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)GoDataDriven
 
Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitwarebigdataviz_bay
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveIBM Cloud Data Services
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesSATOSHI TAGOMORI
 
What are you waiting for
What are you waiting forWhat are you waiting for
What are you waiting forJason Strate
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software DevelopmentAlexis Seigneurin
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics Sean Forgatch
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014Craig Jordan
 
Lightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataLightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataPavel Hardak
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Databricks
 

Similar to PyATL Meetup, Oct 8, 2015 (20)

Elasticsearch Atlanta Meetup 3/15/16
Elasticsearch Atlanta Meetup 3/15/16Elasticsearch Atlanta Meetup 3/15/16
Elasticsearch Atlanta Meetup 3/15/16
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
 
Trending with Purpose
Trending with PurposeTrending with Purpose
Trending with Purpose
 
Building a Microservices-based ERP System
Building a Microservices-based ERP SystemBuilding a Microservices-based ERP System
Building a Microservices-based ERP System
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_WilkinsMongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
 
Blazing Data With Redis (and LEGOS!)
Blazing Data With Redis (and LEGOS!)Blazing Data With Redis (and LEGOS!)
Blazing Data With Redis (and LEGOS!)
 
MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoring
 
Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)
 
Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitware
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
What are you waiting for
What are you waiting forWhat are you waiting for
What are you waiting for
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 
Lightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataLightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional data
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
SQL vs NoSQL
SQL vs NoSQLSQL vs NoSQL
SQL vs NoSQL
 

More from Roy Russo

Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018Roy Russo
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017Roy Russo
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Roy Russo
 
Introduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupIntroduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupRoy Russo
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014Roy Russo
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Ajug hibernate-dos-donts
Ajug hibernate-dos-dontsAjug hibernate-dos-donts
Ajug hibernate-dos-dontsRoy Russo
 

More from Roy Russo (7)

Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Introduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupIntroduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users Group
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Ajug hibernate-dos-donts
Ajug hibernate-dos-dontsAjug hibernate-dos-donts
Ajug hibernate-dos-donts
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

PyATL Meetup, Oct 8, 2015

  • 1. P r o c e s s i n g B i g D a t a P r e d i c t i v e A n a l y t i c s PyATL Meetup 10/18/2015
  • 2. ‹#› W h o A m I • Roy Russo • VP Engineering, Predikto
  • 3. ‹#› W h y A m I H e r e ? & (Big Data) Predictive Analytics
  • 4. ‹#› A g e n d a • What we do • Problems we faced • How we solved them • Rationale • What’s good. What’s not.
  • 5. ‹#› W h o i s P r e d i k t o ? • Atlanta-based • Founded in 2012 • Funded • Paying Customers • Mechanical Engineers • Big Data Architects • Global 1000 … and we don’t suck.
  • 6. ‹#› W h a t w e d o ? • Actionable Predictive Maintenance • Predictive Analytics • Real-time health scoring • Unified asset health view • SaaS
  • 7. H O W D O E S P R E D I C T I V E A N A L Y T I C S W O R K ?
  • 8. ‹#› D a t a S o u r c e s
  • 9. ‹#› H o w W e D o I t TRAIN MGT SYSTEMS INTELLITRAIN GE RM&D EMD NYAB LEADER MOTIVEPOWE R WABTEC EAM SAP INFOR ORACLE MAXIMO OTHER BEACONS WILD WEATHER TCIS CUSTOM APPS UMLER TIME-BASED ACTIONABLE PREDICTIONS PREDIKTO ENTERPRISE PLATFORM PREDIKTO INPUT API’S DATA TRANSFORM. ENGINE MAX MACHINE LEARNING ENGINE PREDIKTO OUTPUT APIS & DASHBOARDS
  • 10. ‹#› T h e P i p e l i n e Standard JSON AutoDynamic Feature Engineering AutoDynamic Feature Selection ETL Email SMS Integration UI Data Store PrediktodataPipeline “MAX” OperationalIntegration Data Aggregation/ETL Machine Learning/Analytics Outbound APIs/Integration
  • 11. ‹#› H i g h - L e v e l R e q u i r e m e n t s • ETL on LARGE datasets • Fast • Commodity hardware • Feature scoring/selection on LARGE datasets • Scale horizontally • Runs from Instruction-set • Visualize LARGE datasets • Time-Series • Fast • Commodity hardware • Scale horizontally • Support dynamic querying • Differing Schema Data Processing Data Querying
  • 12. ‹#› W h y S p a r k ? • ETL: • Shared Memory • Not Disk-Bound • Distributed workloads • Feature scoring/selection on LARGE datasets • Same as above • Scale horizontally • New node = more capacity • Spin up. Spin down. • Runs from Instruction-set • DAGs • Python Devs… PySpark
  • 13. ‹#› I m p l e m e n t i n g S p a r k • Use DAGS • Directed Acyclic Graphs • Config-Driven Workflows • Tune to Job • Workers / Core • Memory Tuning • CPU or RAM ? • Cons: • Steep learning curve • Dev & Ops • Documentation • Exception handling • Native Scala
  • 14. ‹#› S p a r k U I
  • 15. ‹#› M o r e o n D A G s class comma_to_decimal(BaseDagTask): @staticmethod def run(sc, config, rdds, log): from shippable.dag_tasks._utils import safe_map out = [] for idx, rdd in enumerate(rdds): if rdd is not None: cols = [str(x).strip() for x in config['cols'].split(',')] rdd = safe_map(rdd, lambda x: {k: v if str(k) not in cols else str(v).replace('.', '').replace(',', '.') for k, v in x.items()}, log) out.append(rdd) else: out.append(None) return out, None "datetimes_ones": { "after": "datatypes_ones", "type": "convert_datetimes", }, “comma_convert": { "after": "datetimes_ones", "type": “comma_to_decimals”, }, "parentids_ones": { "after": "devids_ones", "type": "comma_convert", }, "timestamps_ones": { "after": "parentids_ones", "type": "rename", },
  • 16. ‹#› W h y E l a s t i c s e a r c h ? • Time-Series Data • Fast-Reads • Fast writes with Bulk Inserts • Asynch • Dynamic querying • Differing schema • Everything is indexed • Visualization • GeoJSON • Scale horizontally • New node = more capacity • Spark-ES Connector • REST API & Python lib
  • 17. ‹#› E S - H a d o o p client = Elasticsearch(hosts=[es_uri]) query = '{"query": {"filtered": {"filter": {"terms": {"date_epoch": ' + some_var +'}}}}}' es_conf = { "es.nodes": es_host_name, "es.port": es_host_port, "es.resource": es_index + '/' + es_mapping, "es.query": query } es_RDD = sc.newAPIHadoopRDD( inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat", keyClass="org.apache.hadoop.io.NullWritable", valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=es_conf) dicts = es_RDD.map(lambda x: x[1]).map(lambda x: reformat_location(x))
  • 18. ‹#› I m p l e m e n t i n g E l a s t i c s e a r c h • Tune • Memory-Bound • Large Drives (SSDs) • Front w/ Load Balancer • Spark • Schema: • Understand your customer’s data! • Typing is important • Timestamps, GeoHASH, Floats, etc… • Cons: • Moderate learning curve • Query syntax • Security
  • 19. ‹#› A r c h i t e c t u r e
  • 20. ‹#› I m p l e m e n t i n g E l a s t i c s e a r c h
  • 21. P R E D I C T | P R E V E N T | P E R F O R M http://www.predikto.com/company/careers