Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Working in a Data Engineering/Operations role which focuses heavily on financial datasets. Everything is within AWS and Snowflake and each table can easily have >100M records of any type of random data (there is a lot of breadth.) General day to day is creating jobs that will process large amounts of input data and storing them into Snowflake, sending out tons of automated reports and emails to decision makers as well as gathering more data from the web.

All of this is done in a Python environment with usage of Rust for speeding up critical code/computations. (The rust code is delivered as Python modules.)

The work is interesting and different challenges arise when having to process and compute datasets that are updated with 10s of TBs of fresh data daily.





> General day to day is creating jobs that will process large amounts of input data and storing them into Snowflake

About how long do these typically take to execute? Minute, Tens of Minutes, Hours?

My work if very iterative where the feedback loop is only a few minutes long.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: