This document provides an overview of Texas Instruments' use of InfluxDB and Grafana for time-series data analysis. It discusses (1) Texas Instruments' business and the importance of data-driven decisions, (2) their current setup ingesting metrics like tool utilization into InfluxDB, and (3) examples of Grafana dashboards created using this data. It also covers some issues encountered and plans to expand usage, including additional metrics, time-series modeling, and developing applications.
[2024]Digital Global Overview Report 2024 Meltwater.pdf
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improve Efficiencies
1. Texas Instruments – The Road to
Understanding Inefficiencies with
InfluxDB
Presentation by: Mike Hinkle
1
2. 2
Table of Contents
• About Texas Instruments
• What we do
• About me
• Common terms (Tools, States, Modules, Metrics, etc..)
• Data discussion
• Examples of data based decisions
• Current challenges
• InfluxDB and our current use case
• Why InfluxDB?
• Current setup and metrics captured
• Examples of dashboards currently in use
• Example of Grafana alert using InfluxDB derivative() on signal/trace
• Issues encountered using InfluxDB & Grafana
• What’s next
3. 3
Texas Instruments – Brief Overview
• Texas Instruments was founded in 1951
• Previously ‘Geophysical Service Incorporated’
• Headquarters in Dallas, Texas
• Currently employees ~30k individuals World Wide
• Semiconductor Manufacturing Facilities located in:
• United States
• Germany
• Japan
• China
What we do:
Make calculators (just kidding.. We do, but this is a small part of our portfolio)
We Design, Manufacture, Test, Package, & Sell semiconductor devices
5. 5
Common terms used in presentation
• Wafer – Typically made of Silicon (Si)
• Wafers are traditionally started in
counts of 25 which are referred to
as a LOT
• Lots can have fewer than 25
wafers
• Example of a wafer map is shown
to the right
• Each small square is a single die
• Each die is an IC comprised of
transistors, resistors, etc..
• After mfg and testing, the wafers
are typically diced/cut and
packaged for final testing
6. 6
Common terms used in presentation – Cont.
• Tool – The equipment used to manufacture,
process & test semiconductors (wafers) i.e.:
• Furnaces (seen to right)
• Ion Implanters
• Epi Reactors
• Plasma Etchers
• Metrology (Rs, thickness, etc.)
• Testers
• State – Numeric or character code which
defines the current state of our tools
• PROD, 06T, 01, etc..
• Currently 515 state definitions in DFAB
• States mapped to multiple categories
• LYDOP Furnace Stack
(Phosphorus Doping)
7. 7
Common terms used in presentation – Cont.
• Modules – The division of common tools
or processes into teams and groups for
the purpose of manufacturing
semiconductors
• Diffusion
• Surface Prep.
• Implant
• Epi
• Plasma
• Photo
• MultiProbe & Testprobe
• Modules typically divided into toolsets
and processes and owned by Engineers
• Modules report out often on metrics
• Metrics
• Availability (Ao)
• Overall Equip. Utilization
• Fail Events (raw and norm.)
• Mean Time to Repair
• Mean Time Between Failure
• First Pass Success
• Cycle Time
• … the list goes on …
8. 8
Why is access to fast, accurate data important?
Common Data Based Decisions:
• Process adjustments (time, temperature, pressure, etc…)
• Process Stability (Cpk, UCL, LCL)
• Tool Stability (Ao, OEU, MTTR, MTBF)
• Track preventative maintenance (use based or time based)
• Troubleshooting
• Anomaly Detection and Interdiction
• Yield
• Planning
Inaccurate or misinterpreted data could lead to wafers being scrapped, yield loss,
customer deadlines not being met and ultimately losing valuable business.
9. 9
Current challenges
• We need immediate feedback if/when tool/process issues arise
• Failure to respond could equate to scrapped wafers or missing customer
deadlines and ultimately lost revenue
• Email & text notifications are a must, automated interdiction is the goal
• We need a more efficient means of extracting the relevant data for reporting
• Reduction of time preparing for meetings and presentations
• Most high-level metrics are temporal
• We need a system which can be used efficiently by non-CS/CE employees
• Most employees in our factories come from Electrical, Mechanical,
Chemical Engineering or Physics backgrounds (not full-stack devs.)
• Our primary job does not revolve around creating applications
• We need the ability to prototype and experiment
• Write access to RDBMS is not easy to come by
• Flat file systems are not efficient
10. 10
Why InfluxDB
• Familiarity from personal project
• Installation and setup took minutes
• Multitude of available clients
• NoSQL makes for minimal planning and easier prototyping
• SQL-esque: Do not need to learn a new query language (using InfluxQL)
• Fast query response
• Easier to handle edge cases (metrics split and aggregated across days, years)
• In-build math and forecasting functions (d/dt, integrate, Holt-Winters, etc..)
• Efficient hard disk memory usage
• Currently >1.5M points/records written per day
• Easy plugin for Grafana (experimenting w/ Chronograf now)
• Open Source edition is free (MIT License)
11. 11
Current installation setup
Setup:
• Single InfluxDB instance (OSS) installed on Linux VM
• TSM, WAL & Raft data all stored on expandable development mount
• Loader scripts executed via cronjob, query RDBMS and load InfluxDB
• Data is filtered, sanitized & calculated if possible during initial query
• Grafana server running on separate Linux VM instance
• Apache web server is reverse proxied to point to Grafana server on port 3000
Metrics being collected:
• OEU (Overall Equipment Utilization: [Time Testing Good Die / Total Time])
• Ao (Tool Availability: [Uptime / Total Time])
• Occurrences of States (States define the state of our tools, PM’s, PROD, etc..)
• Cycle Time (how long it takes for a wafer to be tested [device tag])
12. 12
Block diagram of current InfluxDB loading scheme
RDBMS
Linux Virtual Machine
• Runs Apache webserver
• Runs Grafana server
• Python cronjobs query
RDBMS, parse data and
write to InfluxDB
Linux Virtual Machine
• Runs InfluxDB instance
• TSM, WAL & Raft
Metadata stored on
expandable development
mount
RDBMS
RDBMS
InfluxDB
14. 14
Decomposing DFAB Probe’s OEU Time-Series
• Measurement
queried using
InfluxDB Python
client
• Mean(OEU) was
grouped by 1 hour
• No differencing or
transformations
were applied to TS
• Period length (T)
for seasonality
defined as 14 days
based on our shift
rotations
• Decomposition
performed using
stats-models
Python library
15. 15
OEU: Trend Components & Forecast w/ fbProphet
• Plots generated using fbProphet (Python3.6)
• Component plot supports theory that our performance is weaker
on the back half of the week
• Next step is to determine performance and operation deltas
across shifts to improve metrics and output
16. 16
Example of actual Grafana alert sent via email
• Not too concerned if our OEU
drops below our goal for a
short duration of time
• More concerning is if we drop
quickly (rate of change). This
could indicate a larger problem
• In the case seen to the left, an
application was failing to write
to a DB causing the tools to
lock up.
• IT needed to interdict on their
end, but we needed to notify
• InfluxDB’s DERIVATIVE()
function allows us to easily
trigger alerts for this use case
17. 17
Grafana Dashboard – States Counts (1 sample/minute)
• Stacked Barchart, colored by tool state
• Pie charts show the state distribution
• State occurrences (left)
• E10 occurrences (right)
• Single stat panels show how many testers
are in production and how many are down
18. 18
Grafana Dashboard – Ao Diffusion (1 sample/minute)
• Green Trace: Overall Tool Availability % (1m)
• Bottom plot shows my previous toolsets
(POCL & TEOS are tags/indexed)
19. 19
Issues encountered
• Could not write to InfluxDB via the Python client library
• Solution: http_proxy & https_proxy were set and sourced in runcom (.cshrc)
file. I needed to unset these system variables for the script to work as
expected
• Problems using DERIVATIVE() function
• Solution: For my use case, I needed to specify a WHERE clause with a
time constraint.
• Initial DB writes were not showing correct time’s when viewed in Grafana
• Solution: InfluxDB expects UTC time and our DB time stamps are stored
and displayed for ‘America/Chicago’ time
• No primitive for aggregating by month in InfluxQL (1d, 14d, 30d, 1m?)
• Solution: Flux can apparently handle this or I can add a Month tag to the
data for grouping.
20. 20
What’s next
• Prototype Dash app for single toolset (proof-of-concept)
• One-stop shopping for Toolset Health
• Tool stability high level
• Process stability high level
• Work to model high level metrics across shifts for DFAB probe
• Forecasting (planning, capacity, financial, staffing levels, etc..)
• ARIMA (Autoregressive Integrated Moving Average)
• SARIMA
• Holt-Winters
• fbProphet (experiment with add_regressors)
• Push more signals to time-series (TTR, TBF, etc..)
• Presentation at work on time-series data and my use of InfluxDB/Grafana
• Open flood gates for new ideas or application